Reto Achermann
Assistant Professor
Systopia Lab
Department of Computer Science
University of British Columbia
Shoal: Smart Allocation and Replication of Memory for Parallel Programs
Authors
Stefan Kaestle, Reto Achermann, Timothy Roscoe and Tim Harris
Venue
Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC'15)
Links
Abstract
Modern NUMA multi-core machines exhibit complex latency and throughput characteristics, making it hard to allocate memory optimally for a given program's access patterns. However, sub-optimal allocation can significantly impact performance of parallel programs.
We present an array abstraction that allows data placement to be automatically inferred from program analysis, and implement the abstraction in Shoal, a runtime library for parallel programs on NUMA machines. In Shoal, arrays can be automatically replicated, distributed, or partitioned across NUMA domains based on annotating memory allocation statements to indicate access patterns. We further show how such annotations can be automatically provided by compilers for high-level domainspecific languages (for example, the Green-Marl graph language). Finally, we show how Shoal can exploit additional hardware such as programmable DMA copy engines to further improve parallel program performance.
We demonstrate significant performance benefits from automatically selecting a good array implementation based on memory access patterns and machine characteristics. We present two case-studies: (i) Green-Marl, a graph analytics workload using automatically annotated code based on information extracted from the high-level program and (ii) a manually-annotated version of the PARSEC Streamcluster benchmark.
Bibtex
@inproceedings{Kaestle:2015:SSA, author = {Kaestle, Stefan and Achermann, Reto and Roscoe, Timothy and Harris, Tim}, booktitle = {Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference}, id = {Kaestle:2015:SSA}, isbn = {978-1-931971-225}, location = {Santa Clara, CA}, pages = {263--276}, publisher = {USENIX Association}, series = {USENIX ATC'15}, title = {Shoal: Smart Allocation and Replication of Memory for Parallel Programs}, url = {http://dl.acm.org/citation.cfm?id=2813767.2813787}, year = {2015} }