High Performing Cache Hierarchies for Server Workloads -- Relaxing Inclusion to Capture the Latency Benefits of Exclusive Caches

Publication image

Increasing transistor density enables adding more on-die cache real-estate. However, devoting more space to the shared last- level-cache (LLC) causes the memory latency bottleneck to move from memory access latency to shared cache access latency. As such, applications whose working set is larger than the smaller caches spend a large fraction of their execution time on shared cache access latency. To address this problem, this paper investigates increasing the size of smaller private caches in the hierarchy as opposed to increasing the shared LLC. Doing so improves average cache access latency for workloads whose working set fits into the larger private cache while retaining the benefits of a shared LLC. The consequence of increasing the size of private caches is to relax inclusion and build exclusive hierarchies. Thus, for the same total caching capacity, an exclusive cache hierarchy provides better cache access latency.

We observe that server workloads benefit tremendously from an exclusive hierarchy with large private caches. This is primarily because large private caches accommodate the large code working- sets of server workloads. For a 16-core CMP, an exclusive cache hierarchy improves server workload performance by 5-12% as compared to an equal capacity inclusive cache hierarchy. The paper also presents directions for further research to maximize performance of exclusive cache hierarchies.

Authors

Joseph Nuzman (Intel)
Adrian Moga (Intel)
Simon C. Steely Jr. (Intel)

Publication Date

Research Area

Uploaded Files