Shoal: Smart Allocation and Replication of Memory for Parallel Programs

Authors

Stefan Kaestle, Reto Achermann, Timothy Roscoe and Tim Harris

Venue

Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC'15)

Links

[ .doi ] [ .pdf ] [ .bib ]

Abstract

Modern NUMA multi-core machines exhibit complex latency and throughput characteristics, making it hard to allocate memory optimally for a given program's access patterns. However, sub-optimal allocation can significantly impact performance of parallel programs.

We present an array abstraction that allows data placement to be automatically inferred from program analysis, and implement the abstraction in Shoal, a runtime library for parallel programs on NUMA machines. In Shoal, arrays can be automatically replicated, distributed, or partitioned across NUMA domains based on annotating memory allocation statements to indicate access patterns. We further show how such annotations can be automatically provided by compilers for high-level domainspecific languages (for example, the Green-Marl graph language). Finally, we show how Shoal can exploit additional hardware such as programmable DMA copy engines to further improve parallel program performance.

We demonstrate significant performance benefits from automatically selecting a good array implementation based on memory access patterns and machine characteristics. We present two case-studies: (i) Green-Marl, a graph analytics workload using automatically annotated code based on information extracted from the high-level program and (ii) a manually-annotated version of the PARSEC Streamcluster benchmark.

Bibtex

@inproceedings{Kaestle:2015:SSA,
 author = {Kaestle, Stefan and Achermann, Reto and Roscoe, Timothy and Harris, Tim},
 booktitle = {Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference},
 id = {Kaestle:2015:SSA},
 isbn = {978-1-931971-225},
 location = {Santa Clara, CA},
 pages = {263--276},
 publisher = {USENIX Association},
 series = {USENIX ATC'15},
 title = {Shoal: Smart Allocation and Replication of Memory for Parallel Programs},
 url = {http://dl.acm.org/citation.cfm?id=2813767.2813787},
 year = {2015}
}

Contact

The University of British Columbia
Department of Computer Science
2366 Main Mall
ICICS Building, Office 341
Vancouver, BC V6T 1Z4
Canada

achreto [at] cs.ubc.ca
+1 604 827 2446