Article development led by
The biosciences need an image format
capable of high performance and long-term
maintenance. Is HDF5 the answer?
b Y mAtthe W t. DouGhert Y, miChAeL J. foLK, ereZ ZADoK,
herbert J. bernstein, frAnCes C. bernstein,
KeVin W. eLiCeiri, Werner benGer, ChristoPh best
THE BIOLOGICAL SCIENCES need a generic image format
suitable for long-term storage and capable of
handling very large images. Images convey profound
ideas in biology, bridging across disciplines.
Digital imagery began 50 years ago as an obscure
technical phenomenon. Now it is an indispensable
computational tool. It has produced a variety of
incompatible image file formats, most of which are
Several factors are forcing the obsolescence:
rapid increases in the number of pixels per image;
acceleration in the rate at which images
are produced; changes in image designs to cope with new scientific instrumentation and concepts; collaborative
requirements for interoperability of images collected in different labs on different instruments; and research metadata dictionaries that must support
frequent and rapid extensions. These
problems are not unique to the biosciences. Lack of image standardization is
a source of delay, confusion, and errors
for many scientific disciplines.
There is a need to bridge biological
and scientific disciplines with an image framework capable of high computational performance and interoperability. Suitable for archiving, such
a framework must be able to maintain
images far into the future. Some frameworks represent partial solutions: a
few, such as XML, are primarily suited
for interchanging metadata; others,
such as CIF (Crystallographic Information Framework), 2 are primarily suited
for the database structures needed for
crystallographic data mining; still others, such as DICOM (Digital Imaging
and Communications in Medicine), 3
are primarily suited for the domain of
clinical medical imaging.
What is needed is a common image
framework able to interoperate with
all of these disciplines, while providing high computational performance.
HDF (Hierarchical Data Format) 6 is
such a framework, presenting a historic opportunity to establish a coin
of the realm by coordinating the imagery of many biological communities.
Overcoming the digital confusion of
incoherent bio-imaging formats will
result in better science and wider accessibility to knowledge.
frameworks, and images
Digital imagery and computer technology serve a number of diverse biological communities with terminology
differences that can result in very different perspectives. Consider the word
format. To the data-storage community the hard-drive format will play a ma-