˲ Transactions: Support for ACID
(atomicity, consistency, isolation, durability) transactions offering well-defined guarantees about modifications
to data structures that reside in persistent memory and are accessible by
˲ Memory leaks and permanent corruption: Persistence makes memory
leaks and errors that are normally recoverable through program restart or
reset, more pernicious. Strong safety
guarantees are needed to avoid permanent corruption.
˲ Performance: Providing tailored capabilities and leveraging the advantages of low latency and high throughput
enabled by NVDIMM technology.
˲ Scalability: Scaling data structures
to multi-terabytes also require scaling
of metadata and region management
˲ Pointer swizzling: Modifying embedded (virtual address) pointer
references for object/data structure
The real impact of NVDIMMs remains to be seen. However, work by
Coburn et al. 6 on NV-Heaps has shown
that for certain applications the move
from a transactional database to persistent memory can bring significant
NVDIMM-based persistent memory
lends itself to integration with user
space approaches because it inherently provides access directly to the user
space application (although mapping
and allocation may remain the kernel’s control). This enables efficient,
zero-copy DMA-centric movement of
data through the memory hierarchy
and into the storage device. A longer-term vision is for a converged memory-storage paradigm whereby traditional
storage services (for example, durability, encryption) can be layered into the
memory paradigm. However, to date,
this topic remains largely unaddressed
by the community.
Mainstream operating systems are
based on IO architectures with a 50-
year heritage. New devices now chal-
lenging these traditional designs
bring unprecedented levels of concur-
rency and performance. The result is
that we are entering an era of CPU-IO
performance inversion, where CPU re-
sources are becoming the bottleneck.
Careful consideration of execution
paths is now paramount to effective
User space, kernel-bypass strategies, provide a vehicle to explore and
quickly develop new IO stacks. These
can be used to exploit alignment of
requirements and function, becoming readily tailored and optimized to
meet the specific needs of an application. Flexibility of user space software
implementation (as opposed to kernel
space) enables easier development
and debugging, and enables the leverage of existing application libraries (for
example, machine learning).
For the next decade, microprocessor
design trends are expected to continue
to increase on die transistor count. As
instruction-level parallelism and clock
frequency increases have reached a
plateau, increased core count and on-chip accelerators are the most likely
differentiators for future processor
generations. There is also the possibility of “big” and “little” cores whereby
heterogeneous cores, with different
capabilities (for example, pipelining, floating point units, and clock
frequency), exist on the same processor package. This is already evident in
ARM-based mobile processors. Such
an approach could help drive a shift
away from interrupt-based IO, toward
polling IO whereby “special” cores are
dedicated to IO processing (possibly at
a lower clock frequency). This would
both eliminate context switches and
cache pollution, and would also enable
improved energy management and determinism in the system.
Large capacity, NVDIMM-based
persistent memory is on the horizon.
The availability of potentially up to
terabytes of persistent memory, with
sub-microsecond access latencies and
cache-line addressability, will accelerate the need to make changes in the IO
software architecture. User space IO
strategies are well positioned to meet
the demands of high-performance
storage devices and to provide an ecosystem that can effectively adopt load/
store addressable persistence.
1. Abramson, D. et al. Intel virtualization technology for
directed IO. Intel Technology J. 10, 3 (2006), 179–192.
2. Atkinson, M. and Morrison, R. Orthogonally Persistent
Object Systems. The VLDB J. 4, 3 (July 1995), 319–402.
3. Belay, A., Prekas, G., Klimovic, A., Grossman, S.,
Kozyrakis, C. and Bugnion, E. IX: A protected
dataplane operating system for high throughput and
low latency. In Proceedings of USENIX Operating
Systems Design and Implementation, Oct. 2014,
4. Bhattacharya, S.P. A Measurement Study of the
Linux TCP/IP Stack Performance and Scalability on
SMP systems, Communication System Software and
5. Bjørling, M., Axboe, J., Nellans, D. and Bonnet, P.
Linux block IO: Introducing multi-queue SSD access
on multi-core systems. In Proceedings of the 6th
International Systems and Storage Conf., 2013,
22:1–22: 10. ACM, New York, N Y, USA.
6. Coburn, J. et al. NV-Heaps: Making persistent objects
fast and safe with next-generation, non-volatile
memories. SIGPLAN Notices 46, 3 (Mar. 2011), 105–118.
7. Dearle, A., Kirby, G.N.C. and Morrison, R. Orthogonal
persistence revisited. In Proceedings of the 2nd
International Conference on Object Databases, 2010,
Springer Berlin, Heidelberg.
8. Gorman, M. Understanding the Linux Virtual Memory
Manager. Prentice Hall PTR, Upper Saddle River, NJ,
9. Grundler, G. Porting drivers to HP ZX1. Ottawa Linux
10. Intel Corporation. Intel 64 and IA- 32 Architectures
Optimization Reference Manual. No. 248966-033,
11. Intel Corporation. PCI-SIG Single Root IO
Virtualization Support in Intel® Virtualization
Technology for Connectivity; https://www.intel.com/
12. Kannan, S., Gavrilovska, A. and Schwan, K. PVM:
Persistent virtual memory for efficient capacity
scaling and object storage. In Proceedings of the 11th
European Conference on Computer Systems, 2016,
13:1–13: 16. ACM, New York, NY, USA.
13. Kemper, A. and Kossmann, D. Adaptable pointer
swizzling strategies in object bases: Design,
realization, and quantitative analysis. International J.
Very Large Data Bases 4, 3 (July 1995), 519–567.
14. Klimovic, A., Litz, H. and Kozyrakis, C. ReFlex: Remote
Flash ≈ Local Flash. In Proceedings of the 22nd
International Conference on Architectural Support
for Programming Languages and Operating Systems,
2017, 345–359. ACM, New York, N Y.
15. Kumar, P. and Huang, H. Falcon: Scaling IO
performance in multi-SSD volumes. In Proceedings of
USENIX Annual Technical Conference (Santa Clara,
CA, July 2017).
16. Lewin-Berlin, S. Exploiting multicore systems with
Cilk. In Proceedings of the 4th International Workshop
on Parallel and Symbolic Computation, 2010, 18–19.
ACM, New York, NY, USA. ACM.
17. Lin, F.X. and Liu, X. Memif: Towards programming
heterogeneous memory asynchronously. SIGARCH
Computing Architecture News 44, 2 (Mar. 2016),
18. Siemon, D. Queueing in the Linux network stack. Linux
J. 231 (July 2013).
19. Tuning throughput performance for Intel Ethernet
adapters (2017); http://www.intel.com/content/
www/us/en/support/ network-and-i-o/ethernet-products/ 000005811.html
20. Unrau, R. and Krieger, O. Efficient sleep/wake-up
protocols for user-level IPC. In Proceedings
of the 1998 International Conference on
21. Volos, H., Tack, A.J. and Swift, M.M. Mnemosyne:
Light weight persistent memory. SIGPLAN Notices 47,
4 (Mar. 2011), 91–104.
22. Walker, B. SPDK: Building blocks for scalable high-performance storage applications. SNIA Storage
Developer Conference, 2016, Santa Clara, CA, USA;
Daniel Waddington ( firstname.lastname@example.org) is
a research staff member at IBM Almaden Research
Center in San Jose, CA, USA.
Jim Harris ( email@example.com) is a principal
engineer in the Network Platforms Group at Intel
Corporation, Chandler, AZ, USA.
© 2018 ACM 0001-0782/18/11 $15.00