Frugal ECC: Efficient and Versatile Memory Error Protection through Fine-Grained Compression

Because main memory is vulnerable to errors and failures, large-scale systems and critical servers utilize error checking and correcting (ECC) mechanisms to meet their reliability requirements. We propose a novel mechanism, Frugal ECC (FECC), that combines ECC with fine-grained compression to provide versatile protection that can be both stronger and lower overhead than current schemes, without sacrificing performance. FECC compresses main memory at cache-block granularity, using any left over space to store ECC information. Compressed data and its ECC information are then frequently read with a single access even without redundant memory chips; blocks that do not compress sufficiently require additional storage and accesses. As examples of FECC, we present chipkill-correct and chipkill-level ECCs on a 64-bit non-ECC DIMM with ×4 DRAM chips. We also describe the first true chipkill-correct ECC for ×8 devices using conventional ECC DIMMs. FECC relies on a new Coverage-oriented Compression (CoC) scheme that we developed specifically for the modest compression needs of ECC and for floating-point data. CoC can sufficiently compress 84% of all accesses in SPEC Int, 93% in SPEC FP, 95% in SPLASH2X, and nearly 100% in the NPB suites. With such high compression coverage, the worst case performance degradation from FECC is less than 3.7% while reliability is slightly improved and energy consumption reduced by about 50% for true chipkill-correct.

Authors: 
Jungrae Kim (UT)
Publication Date: 
Sunday, November 1, 2015
Research Area: