exit from the software.
The inflexibility of current biologi- ˲
cal image file designs prevents them
from adapting to future modalities and
dimensionality. Rapid advances in biological instrumentation and computational analysis are leading to complex
imagery involving novel physical and
statistical pixel specifications.
The inability to assemble different ˲
communities’ imagery into an overarching image model allows for ambiguity in
the analysis. The integration of various
coordinate systems can be an impassable obstacle if not properly organized.
There is an increasing need to correlate
images of different modalities in order
to observe spatial continuity from millimeter to angstrom resolutions.
The non-archival quality of images ˲
undermines their long-term value. The
current designs usually do not provide
basic archival features recommended
by the Digital Library Federation, nor
do they address issues of provenance.
Frequently, the documentation of a
community image format is incomplete, outdated, or unavailable, thus
eroding the ability to interpret the digital artifact properly.
It would be desirable to adopt an existing scientific, medical, or computer
image format, and simply benefit from
the consequences. All image formats
have their strengths and weaknesses.
They tend to fall into two categories:
generic and specialized formats. Generic image formats usually have fixed
dimensionality or pixel design. For example, MPEG29 is suitable for many
applications as long as it is 2D spatial
plus 1D temporal using red-green-blue
modality that is lossy compressed for
the physiological response of the eye.
Alternatively, the specialized image
formats suffer the difficulties of the
image formats we are already using.
For example, DICOM3 (medical imaging standard) and FITS5 (astronomical
imaging standard,) store their pixels
as 2D slices, although DICOM does
incorporate MPEG2 for video-based
The ability to tile (2D), brick (3D), or
chunk (nD) is required to access very
large images. Although this is conceptually simple, the software is not, and
must be tested carefully or risk that
subsequent datasets be corrupted.
That risk would be unacceptable for
operational software used in data repositories and research. This function
and its certification testing are critical
features of HDF software that are not
readily available in any other format.
The objectives of these acquisition
communities are identical, requiring
performance, interoperability, and
archiving. There is a real need for the
different bio-imaging communities
to coordinate within the same HDF5
data file by using identical high-performance methods to manage pixels;
avoiding namespace collisions between the biological communities;
and adopting the same archival best
practices. All of these would benefit
downstream communities such as visualization developers and global repositories.
Performance. The design of an image
file format and the subsequent organization of stored pixels determine the
performance of computation because
of various hardware and software data-path bottlenecks. For example, many
specialized biological image formats
use simple 2D pixel organizations,
frequently without the benefit of compression. These 2D pixel organizations
are ill suited for very large 3D images
such as electron tomograms or 5D optical images. Those bio-imaging files
have sizes that are orders of magnitude
larger than the RAM of computers.
Worse, widening gaps have formed between CPU/memory speeds, persistent
storage speeds, and network speeds.
These gaps lead to significant delays
in processing massive data sets. Any
file format for massive data has to account for the complex behavior of software layers, all the way from the application, through middleware, down to
operating-systems device drivers. A generic n-dimensional multimodal image format will require new instantia-tion and infrastructure to implement
new types of data buffers and caches to
scale large datasets into much smaller
RAM; much of this has been resolved
Interoperability. Historically the acquisition communities have defined
custom image formats. Downstream
communities, such as visualization
and modeling, attempt to implement
these formats, forcing the communities to confront design deficiencies.
Basic image metadata definitions
such as rank, dimension, and modality
must be explicitly defined so the downstream communities can easily participate. Different research communities
must be able to append new types of
metadata to the image, enhancing the
imagery as it progresses through the
pipeline. Ongoing advances in the acquisition communities will continue
to produce new and significant image modalities that feed this image
pipeline. Enabling downstream users easily to access pixels and append
their community metadata supports
interoperability, ultimately leading to
fundamental breakthroughs in biology. This is not to suggest that different communities’ metadata can be or
should be uniformly defined as a single biological metadata schema and
ontology in order to achieve an effective image format.
Archiving. Scientific images have a
general lack of archival design features.
As the sophistication of bio-imagery
improves, the demand for the placement of this imagery into long-term
global repositories will be greater. This
is being done by the Electron Microscopy Databank4 in joint development
by the National Center for Macromolecular Imaging, the RCSB (Research
Collaboratory for Structural Bioinformatics) at Rutgers University, and the
European Bioinformatics Institute.
Efforts such as the Open Microscopy
Environment14 are also developing
bio-image informatics tools for lab-based data sharing and data mining of
biological images that also are requiring practical image formats for long-term storage and retrieval. Because of
the evolving complexity of bio-imagery
and the need to subscribe to archival
best practices, an archive-ready image
format must be self-describing. That
is, there must be sufficient infrastructure within the image file design to
properly document its content, context, and structure of the pixels and
related community metadata, thereby
minimizing the reliance on external
documentation for interpretation.
the inertia of Legacy software