This paper addresses an I/O interference problem encountered in on-line reconstruction of erasure-coded storage clusters, where user I/Os compete with reconstruction I/Os for both disk and networkbandwidth. We propose a redirection scheme called ‘RAM-RS’ to minimize the I/O interference among user and reconstruction requests. RAM-RS redirects user read/writes targeted at failed nodes to an RS-coded RAM region, which is formed by pre-allocated main memory in surviving nodes in the RS-coding manner. The RS-coded RAM region quickly serves all user read/write misses; therefore, a rebuilding node can devote its disk and network bandwidths to the node reconstruction.
The RAM region substantially reduces the amount of data rebuilt by the rebuilding node, because (1) missed writes are buffered in the RAM region and (2) missed reads are satisfied by using surviving nodes to co-rebuild failed blocks. We build two Markov models to estimate the reliability of the RAM-RS system. Modeling results demonstrate that the MTTDL of RS-coded RAM region in a storage cluster is larger than that of the same cluster comprised of surviving nodes. We implement both RAM-RS and the traditional Redirection schemes in an erasure-coded storage cluster, on which real-world I/O traces are replayed. Experimental results show that compared with the Redirection scheme running on a 9-node storage cluster, RAM-RS improves system performance in terms of both user response time and reconstruction time by a factor of 1.78 and 1.20, respectively.