Unity ECC: Unified Memory Protection Against Bit and Chip Errors

DRAM vendors utilize On-Die Error Correction Codes (OD-ECC) to correct random bit errors internally. Meanwhile, system companies utilize Rank-Level ECC (RL-ECC) to protect data against chip errors. Separate protection increases the redundancy ratio to 32.8% in DDR5 and incurs significant performance penalties. This paper proposes a novel RL-ECC, Unity ECC, that can correct both singlechip and double-bit error patterns. Unity ECC corrects doublebit errors using unused syndromes of single-chip correction. Our evaluation shows that Unity ECC without OD-ECC can provide the same reliability level as Chipkill RL-ECC with OD-ECC. Moreover, it can significantly improve system performance and reduce DRAM energy and area by eliminating OD-ECC.

Authors

Dongwhee Kim (Sungkyunkwan University)
Jaeyoon Lee (Sungkyunkwan University)
Wonyeong Jung (Sungkyunkwan University)
Jungrae Kim (Sungkyunkwan University)

Publication Date

Research Area

Uploaded Files