Control Replication: Compiling Implicit Parallelism to Efficient SPMD with Logical Regions

We present control replication, a technique for generating high-performance and scalable SPMD code from implicitly parallel programs. In contrast to traditional parallel programming models that require the programmer to explicitly manage threads and the communication and synchronization between them, implicitly parallel programs have sequential execution semantics and naturally avoid the pitfalls of explicitly parallel code. However, without optimizations to distribute control overhead, scalability is often poor.

Performance on distributed-memory machines is especially sensitive to communication and synchronization in the program, and thus optimizations for these machines require an intimate understanding of a program’s memory accesses. Control replication achieves particularly effective and predictable results by leveraging language support for first-class data partitioning in the source programming model. We evaluate an implementation of control replication for Regent and show that it achieves up to 99% parallel efficiency at 1024 nodes with absolute performance comparable to hand-written MPI(+X) codes.

Authors

Elliott Slaughter (Stanford University)

Wonchan Lee (Stanford University)

Sean Treichler (NVIDIA)

Wen Zhang (Stanford University)

Michael Bauer

Galen Shipman (Los Alamos National Laboratory)

Patrick McCormick (Los Alamos National Laboratory)

Alex Aiken (Stanford University)

Publication Date

Sunday, November 12, 2017

Published in

International Conference for High Performance Computing and Communications (SC…

Research Area

High Performance Computing

Programming Languages, Systems and Tools

External Links

ACM Digital Library

Uploaded Files

Published manuscript555.4 KB

Copyright

Copyright by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library http://www.acm.org/dl/.