Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures

Publication image

Recently there has been interest in exploring the acceleration of non-vectorizable workloads with spatially-programmed architectures that are designed to efficiently exploit pipeline parallelism. Such an architecture faces two main problems: A) how to efficiently control each processing element (PE) in the system, and B) how to facilitate inter-PE communication without the overheads of traditional shared-memory coherent memory. In this paper, we explore solving these problems using triggered instructions, and latency- insensitive channels. Triggered instructions completely eliminate the program counter and allow programs to transition concisely between states without explicit branch instructions. Latency-insensitive channels allow efficient communication of inter-PE control information, while simultaneously enabling flexible code placement and improving tolerance for variable events such as cache accesses. Together, these approaches provide a unified mechanism to avoid over-serialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading.

Our analysis shows that a spatial accelerator using triggered instructions and latency-insensitive channels can achieve 8× greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64% respectively over a program-counter style baseline, increasing the performance of the spatial programming approach by 2.0×.

Authors

Michael Adler (Intel)
Bushra Ahsan (Intel)
Randy Almon (Intel)
Kermin Fleming (Intel)
Mohit Gambhir (Intel)
Tushar Krishna (Intel)
Stephen Maresh (Intel)
Vladimir Pavlov (Intel)
Rachid Rayess (Intel)
Antonia Zhai (University of Minnesota)

Publication Date

Research Area