8. Hady, F. Wicked fast storage and beyond. In
Proceedings of the 7th Non Volatile Memory Workshop
(San Diego, CA, Mar. 6–8). Keynote, 2016.
9. Han, S., Egi, N., Panda, A., Ratnasamy, S., Shi, G.,
and Shenker, S. Network support for resource
disaggregation in next-generation datacenters. In
Proceedings of the 12th ACM Workshop on Hot Topics in
Networks (College Park, MD, Nov. 21–22). ACM Press,
New York, 2013, 10.
10. Huang, J., Xu, J., Xing, X., Liu, P., and Qureshi, M. K.
Flashguard: Leveraging intrinsic flash properties
to defend against encryption ransomware. In
Proceedings of the 2017 ACM SIGSAC Conference
on Computer and Communications Security (Dallas,
TX, Oct. 30–Nov. 3). ACM Press, New York, 2017,
11. István, Z., Sidler, D., and Alonso, G. Caribou:
Intelligent distributed storage. In Proceedings of the
VLDB Endowment 10, 11 (Aug. 2017), 1202–1213.
12. Jo, I., Bae, D.-H., Yoon, A.S., Kang, J.-U., Cho, S., Lee,
D.D., and Jeong, J. YourSQL: A high-performance
database system leveraging in storage computing.
In Proceedings of the VLDB Endowment
9, 12 (Aug.
13. Jun, S.-W., Liu, M., Lee, S., Hicks, J., Ankcorn, J., King,
M., Xu, S., et al. BlueDBM: An appliance for big data
analytics. In Proceedings of the ACM/IEEE 42nd Annual
International Symposium on Computer Architecture
(Portland, OR, Jun. 13–17). IEEE, 2015, 1–13.
14. Kang, J.-U., Hyun, J., Maeng, H., and Cho, S. The
multi-streamed solid-state drive. In Proceedings of
the 6th USENIX Workshop on Hot Topics in Storage and
File Systems (Philadelphia, PA, Jun. 17–18). USENIX
Association, Berkeley, CA, 2014.
15. Klimovic, A., Kozyrakis, C., Thereska, E., John, B.,
and Kumar, S. Flash storage disaggregation. In
Proceedings of the 11th European Conference on
Computer Systems (London, U. K., Apr. 18–21). ACM
Press, New York, 2016, 29.
16. Ouyang, J., Lin, S., Jiang, S., Hou, Z., Wang, Y., and
Wang, Y. SDF: Software-defined flash for web-scale
Internet storage systems. In Proceedings of the 19th
International Conference on Architectural Support
for Programming Languages and Operating Systems
(Salt Lake City, UT, Mar. 1–5). ACM press, New York,
17. Park, K., Kee, Y.-S., Patel, J. M., Do, J., Park, C., and
Dewitt, D.J. Query processing on smart SSDs. IEEE
Data Engineering Bulletin 37, 2 (Jun. 2014), 19–26.
18. Picoli, I.L., Pasco, C.V., Jónsson, B.Þ., Bouganim,
L., and Bonnet, P. uFLIP-OC: Understanding flash
I/O patterns on open-channel solid state drives.
In Proceedings of the 8th Asia-Pacific Workshop on
Systems (Mumbai, India, Sep. 2–3). ACM Press, New
York, 2017, 20.
19. Schroeder, B., Lagisetty, R., and Merchant, A. Flash
reliability in production: The expected and the
unexpected. In Proceedings of the 14th USENIX
Conference on File and Storage Technologies (Santa
Clara, CA, Feb. 22–25). USENIX Association, Berkeley,
CA, 2016, 67–80.
20. Seshadri, S., Gahagan, M., Bhaskaran, S., Bunker,
T., De, A., Jin, Y., Liu, Y., and Swanson, S. Willow: A
user-programmable SSD. In Proceedings of the 11th
USENIX Symposium on Operating Systems Design and
Implementation (Broomfield, CO, Oct. 6–8). USENIX
Association, Berkeley, CA, 2014, 67–80.
21. Woods, L., István , Z., and Alonso, G. Ibex: An
intelligent storage engine with support for advanced
SQL offloading. In Proceedings of the VLDB
Endowment 7, 11 (Jul. 2014), 963–974.
Jaeyoung Do ( firstname.lastname@example.org) is a researcher at
Microsoft Research, Redmond, WA, USA. He is leading a
project, SoftFlash, which aims to use programmable SSDs
in cloud datacenters.
Sudipta Sengupta ( email@example.com) is leading
new initiatives in artificial intelligence/deep learning at
Amazon AWS, Seattle, WA, USA; the research reported in
this article was done while he was at Microsoft Research,
Redmond, WA, USA.
Steven Swanson ( firstname.lastname@example.org) is a professor
in the Department of Computer Science and Engineering
at the University of California, San Diego, USA.
Copyright held by authors/owners.
Publication rights licensed to ACM. $15.00.
provides opportunities for embracing
them as a first-class programmable
platform in cloud datacenters, enabling software-hardware innovation
that could bridge the gap between application/OS needs and storage capabilities/limitations. We hope to shed
light on the future of software-defined
storage and help chart a direction for
designing, building, deploying, and
leveraging a software-defined storage
architecture for cloud datacenters.
This work was supported in part by
National Science Foundation Award
1. Alves, V. In-situ processing. Flash Memory Summit
(Santa Clara, CA, Aug. 8–10), 2017.
2. Bjørling, M., González, J., and Bonnet, P. Lightnvm: The
Linux open-channel SSD subsystem. In Proceedings
of the 15th USENIX Conference on File and Storage
Technologies (Santa Clara, CA, Feb. 27–Mar. 2).
USENIX Association, Berkeley, CA, 2017, 359–374.
3. Bonnet, P. What’s up with the storage hierarchy?
In Proceedings of the 8th Biennial Conference on
Innovative Data Systems Research (Chaminade, CA,
Jan. 8–11), 2017.
4. Cornwell, M. Anatomy of a solid-state drive. Commun.
ACM 55, 12 (Dec. 2012), 59–63.
5. Do, J. Softflash: Programmable storage in future data
centers. In Proceedings of the 20th SNIA Storage
Developer Conference (Santa Clara, CA, Sep. 11–14), 2017.
6. Do, J., Kee, Y.-S., Patel, J. M., Park, C., Park, K., and
De Witt, D.J. Query processing on smart SSDs:
Opportunities and challenges. In Proceedings of
the ACM SIGMOD International Conference on
Management of Data (New York, NY, Jun. 22–27). ACM
Press, New York, 2013, 1221–1230.
7. Gu, B., Yoon, A. S., Bae, D.-H., Jo, I., Lee, J., Yoon, J.,
Kang, J.-U., Kwon, M., Yoon, C., Cho, S., et al. Biscuit:
A framework for near data processing of big data
workloads. In Proceedings of the ACM/IEEE 43rd
Annual International Symposium on Computer
Architecture (Seoul, S. Korea, Jun. 18–22). IEEE,
using NVMe over Fabrics (NVMe-oF)q
With the programmable storage substrate, we can think of going beyond the
single-device block interface. For example, a micro server inside storage can expose a richer interface like a distributed
key-value store or distributed streams.
Or the storage infrastructure can be managed as a fabric, not as individual devices.
The programmable storage substrate can
also provide high-level datacenter capabilities (such as backup, data snapshot,
replication, de-duplications, and tiering), which are typically supported in a
datacenter server environment where
compute and storage are separated.
This means the programmable storage
substrate can be viewed as a hyper-converged infrastructure where storage, networking, and compute are tightly coupled for low-latency, high-throughput
access, while still providing availability.
In this article, we have presented our
vision of a fully programmable storage substrate in cloud datacenters,
allowing application developers to
innovate the storage infrastructure
at cloud speed like the software application/OS infrastructure. The
programmability evolution in SSDs
q A technology specification designed for non-volatile memories to transfer data between
a host and a target system/device over a network. Approximately 90% of the NVMe-oF protocol is the same as the NVMe protocol.
Figure 7. Enabling a programmable storage substrate decoupled from the host substrate.
Direct traffic between programmable storage devices (with a network
interface) without involving a remote host.