NrOS: Effective Replication and Sharing in an Operating System

Authors

Ankit Bhardwaj, Chinmay Kulkarni, Reto Achermann, Irina Calciu, Sanidhya Kashyap, Ryan Stutsman, Amy Tai and Gerd Zellweger

Venue

Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI'21)

Links

[ .doi ] [ .pdf ] [ .bib ]

Abstract

Writing a correct operating system kernel is notoriously hard. Kernel code requires manual memory management and type-unsafe code and must efficiently handle complex, asynchronous events. In addition, increasing CPU core counts further complicate kernel development. Typically, monolithic kernels share state across cores and rely on one-off synchronization patterns that are specialized for each kernel structure or subsystem. Hence, kernel developers are constantly refining synchronization within OS kernels to improve scalability at the risk of introducing subtle bugs.

We present NrOS, a new OS kernel with a safer approach to synchronization that runs many POSIX programs. NrOS is primarily constructed as a simple, sequential kernel with no concurrency, making it easier to develop and reason about its correctness. This kernel is scaled across NUMA nodes using node replication, a scheme inspired by state machine replication in distributed systems. NrOS replicates kernel state on each NUMA node and uses operation logs to maintain strong consistency between replicas. Cores can safely and concurrently read from their local kernel replica, eliminating remote NUMA accesses.

Our evaluation shows that NrOS scales to 96 cores with performance that nearly always dominates Linux at scale, in some cases by orders of magnitude, while retaining much of the simplicity of a sequential kernel.

Bibtex

@inproceedings{Bhardwaj:2021:NERS,
 author = {Bhardwaj, Ankit and Kulkarni, Chinmay and Achermann, Reto and Calciu, Irina and Kashyap, Sanidhya and Stutsman, Ryan and Tai, Amy and Zellweger, Gerd},
 booktitle = {Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation},
 id = {Bhardwaj:2021:NERS},
 isbn = {978-1-939133-22-9},
 location = {Virtual},
 pages = {295--312},
 publisher = {USENIX Association},
 series = {OSDI'21},
 title = {NrOS: Effective Replication and Sharing in an Operating System},
 url = {https://www.usenix.org/conference/osdi21/presentation/bhardwaj},
 year = {2021}
}

Contact

The University of British Columbia
Department of Computer Science
2366 Main Mall
ICICS Building, Office 341
Vancouver, BC V6T 1Z4
Canada

achreto [at] cs.ubc.ca
+1 604 827 2446