Reto Achermann
Assistant Professor
Systems Research Group
TUM School of Computation, Information and Technology
Shoal: Smart Allocation and Replication of Memory for Parallel Programs
Authors
Stefan Kaestle, Reto Achermann, Timothy Roscoe and Tim Harris
Venue
Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC'15)
Links
Abstract
Modern NUMA multi-core machines exhibit complex latency and throughput characteristics, making it hard to allocate memory optimally for a given program's access patterns. However, sub-optimal allocation can significantly impact performance of parallel programs.
We present an array abstraction that allows data placement to be automatically inferred from program analysis, and implement the abstraction in Shoal, a runtime library for parallel programs on NUMA machines. In Shoal, arrays can be automatically replicated, distributed, or partitioned across NUMA domains based on annotating memory allocation statements to indicate access patterns. We further show how such annotations can be automatically provided by compilers for high-level domainspecific languages (for example, the Green-Marl graph language). Finally, we show how Shoal can exploit additional hardware such as programmable DMA copy engines to further improve parallel program performance.
We demonstrate significant performance benefits from automatically selecting a good array implementation based on memory access patterns and machine characteristics. We present two case-studies: (i) Green-Marl, a graph analytics workload using automatically annotated code based on information extracted from the high-level program and (ii) a manually-annotated version of the PARSEC Streamcluster benchmark.
Bibtex
@inproceedings{Kaestle:2015:SSA,
author = {Kaestle, Stefan and Achermann, Reto and Roscoe, Timothy and Harris, Tim},
booktitle = {Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference},
id = {Kaestle:2015:SSA},
isbn = {978-1-931971-225},
location = {Santa Clara, CA},
pages = {263--276},
publisher = {USENIX Association},
series = {USENIX ATC'15},
title = {Shoal: Smart Allocation and Replication of Memory for Parallel Programs},
url = {http://dl.acm.org/citation.cfm?id=2813767.2813787},
year = {2015}
}
Prof. Reto Achermann
I01: Chair of Distributed Systems and Operating Systems (aka Systems Research Group)
1st Floor, 7th Finger
School of Computation, Information, and Technology (CIT)
Technical University of Munich (TUM)
Boltzmannstr. 3
85748 Garching bei München
Germany
firstname.lastname [at] cit.tum.de


