The rapid growth in simulation data requires large-scale parallel implementations of scientific analysis and visualization algorithms, both to produce results within an acceptable timeframe and to enable in situ deployment. However, efficient and scalable implementations, especially of more complex analysis approaches, require not only advanced algorithms, but also an in-depth knowledge of the underlying runtime. Furthermore, different machine configurations and different applications may favor different runtimes, i.e., MPI vs Charm++ vs Legion, etc., and different hardware architectures. This diversity makes developing and maintaining a broadly applicable analysis software infrastructure challenging.
We address some of these problems by explicitly separating the implementation of individual tasks of an algorithm from the dataflow connecting these tasks. In particular, we present an embedded domain specific language (EDSL) to describe algorithms using a new task graph abstraction. This task graph is then executed on top of one of several available runtimes (MPI, Charm++, Legion) using a thin layer of library calls. We demonstrate the flexibility and performance of this approach using three different large scale analysis and visualization use cases, i.e., topological analysis, rendering and compositing dataflow, and image registration of large microscopy scans. Despite the unavoidable overheads of a generic solution, our approach demonstrates performance portability at scale, and, in some cases, outperforms hand-optimized implementations.
This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to firstname.lastname@example.org.