Data deduplication is an emerging and widely employed method for current storage systems. As this technology is gradually applied in inline scenarios such as with virtual machines and cloud storage systems, this study proposes a novel deduplication architecture called I-sieve. The goal of I-sieve is to realize a high performance data sieve system based on iSCSI in the cloud storage system. We also design the corresponding index and mapping tables and present a multi-level cache using a solid state drive to reduce RAM consumption and to optimize lookup performance.
A prototype of I-sieve is implemented based on the open source iSCSI target, and many experiments have been conducted driven by virtual machine images and testing tools. The evaluation results show excellent deduplication and foreground performance. More importantly, I-sieve can co-exist with the existing deduplication systems as long as they support the iSCSI protocol.