existing storage interfaces allows file
systems to take advantage of new facilities on devices that provide them. Storage system designers can choose whether to require devices that provide those
interfaces or to implement a work-alike
facility that they disable when it is not
needed. Device vendors can decide
whether supporting a richer interface
represents a sufficient competitive advantage. Though this approach may
never lead to an optimal state, it may allow the industry to navigate monotonically to a sufficient local maximum.
the Chicken and the egg
There are still other ways to construct
a storage system around flash. A more
radical approach is to go further than
DirectFS, assigning additional high-level responsibilities to the file system
such as block management, wear leveling, read-disturb awareness, and error
correction. This would allow for a complete reorganization of the software abstractions in the storage system, ensuring sufficient information for proper
optimization where today’s layers must
cope with suboptimal information and
communication. Again, this approach
requires a vendor that can assert broad
control over the whole system—from
the file system to the interface, controller, and flash media. It is certainly tenable for closed proprietary systems—
indeed, several vendors are pursuing this
approach—but for it to gain traction as
a new open standard would be difficult.
The SSDs that exist today for the
volume market are cheap and fast,
but they exhibit performance that is
inconsistent and reliability that is insufficient. Higher-level software designed with full awareness of those
shortcomings could turn that commodity iron into gold. Without redesigning part or all of the I/O interface,
those same SSDs could form the basis
of a high-performing and highly reliable storage system.
Rather than designing a file sys-
tem around the properties of NAND
flash, this approach would treat the
commodity SSDs themselves as the
elementary unit of raw storage. NAND
flash memory already has compli-
cated intrinsic properties; the emer-
gent properties of an SSD are even
more obscure and varied. A common
pathology with SSDs, for example, is
variable performance when servicing
concurrent or interleaved read and
write operations. Understanding these
pathologies sufficiently and creating
higher-level software to accommodate
them would represent the flash ver-
sion of an existential software parable:
enterprise quality from commodity
components. It is a phenomenon that
the storage world has seen before with
disks; software such as ZFS from Sun
has produced fast, reliable systems
from cheap components.
Next for flash
The lifespan of flash as a relevant technology is a topic of vigorous debate.
While flash has ridden its price and
density trends to a position of relevance, some experts anticipate fast-approaching limits to the physics of
scaling NAND flash memory. Others
foresee several decades of flash innovation. Whether it is flash or some other
technology, nonvolatile solid-state
memory will be a permanent part of
the storage hierarchy, having filled the
yawning gap between hard-drive and
CPU speeds. 8
The next evolutionary stage should
see file systems designed explicitly
for the properties of solid-state media
rather than relying on an intermedi-
ate layer to translate. The various ap-
proaches are each imperfect. Incre-
mental changes to the storage interface
may never reach the true acme. Cre-
ating a new interface for flash might
be untenable in the market. Treating
SSDs as the atomic unit of storage may
be just another half-measure, and a
technically difficult one at that.
Anatomy of a Solid-state Drive
Mark Moshayedi, Patrick Wilkison
Flash Disk Opportunity
for Server Applications
Jim Gray, Bob Fitzgerald
1. btrfs wiki; https://btrfs.wiki.kernel.org/index.php/
2. cornwell, m. anatomy of a solid-state drive. ACM
Queue 10, 10 (2012); http://queue.acm.org/detail.
3. elliott, r. and batwara, a. notes to t10 technical
committee. 11-229r4 sbc- 4 sPc- 5 atomic writes and
229r4.pdf; 12-086r2 sbc- 4 sPc- 5 scattered writes,
optionally atomic; http://www.t10.org/cgi-bin/
ac.pl?t=d&f=12-086r2.pdf; 12-087r2 sbc- 4 sPc- 5
gathered reads—optionally atomic; http://www.t10.
4. gray, j. and fitzgerald, b. flash disk opportunity for
server applications. ACM Queue 6, 4 (2008); http://
5. hitz, D., lau, j. and malcolm, m. file system design
for an nfs file server appliance. W TEC ‘ 94 USENIX
Winter 1994 Technical Conference; http://dl.acm.org/
6. josephson, W.k., bongo, l.a., li, k. and flynn, D.
Dfs: a file system for virtualized flash storage. ACM
Transactions on Storage 6, 3 (2010). http://dl.acm.org/
7. leventhal, a. flash storage today. ACM Queue 6, 4
8. leventhal, a. triple-parity raID and beyond. ACM
Queue 7, 11 (2009); http://queue.acm.org/detail.
9. moshayedi, m. and Wilkison, P. enterprise ssDs.
ACM Queue 6, 4 (2008); http://queue.acm.org/detail.
10. the openssD Project; http://www.openssd-project.
11. Purestorage flasharray; http://www.purestorage.
Adam h. Leventhal is the cto at Delphix, a database
virtualization company. Previously he served as lead flash
engineer for sun and then oracle where he designed flash
integration in the Zfs storage appliance, exadata, and