iDedup: latency-aware, inline data deduplication for primary storage 论文

2012引用 254
Advanced Data Storage TechnologiesCloud Computing and Resource ManagementCloud Data Security Solutions

摘要

Deduplication technologies are increasingly being de-ployed to reduce cost and increase space-efficiency in corporate data centers. However, prior research has not applied deduplication techniques inline to the request path for latency sensitive, primary workloads. This is primarily due to the extra latency these techniques intro-duce. Inherently, deduplicating data on disk causes frag-mentation that increases seeks for subsequent sequential reads of the same data, thus, increasing latency. In addi-tion, deduplicating data requires extra disk IOs to access on-disk deduplication metadata. In this paper, we pro-pose an inline deduplication solution, iDedup, for pri-mary workloads, while minimizing extra IOs and seeks. Our algorithm is based on two key insights from real-world workloads: i) spatial locality exists in duplicated primary data; and ii) temporal locality exists in the access patterns of duplicated data. Using the first insight, we se-lectively deduplicate only sequences of disk blocks. This reduces fragmentation and amortizes the seeks caused by deduplication. The second insight allows us to replace the expensive, on-disk, deduplication metadata with a smaller, in-memory cache. These techniques enable us to tradeoff capacity savings for performance, as demon-strated in our evaluation with real-world workloads. Our evaluation shows that iDedup achieves 60-70 % of the maximum deduplication with less than a 5 % CPU over-head and a 2-4 % latency impact. 1

相关技术

暂无数据

相关事件

暂无数据

相关文章

暂无数据