months to upload. Uploading one petabyte at 800Mbps should keep you going for four months.
It’s time to consider an alternative.
Ship It!
That alternative is copying the data to
a storage appliance and shipping the
appliance to the datacenter, at which
end the data is copied to cloud storage. This is the Ship It! strategy. Under
what circumstances is this a viable alternative to uploading the data directly
into the cloud?
The mathematics of shipping data.
When data is read out from a drive, it
travels from the physical drive hardware (for example, the HDD platter)
to the on-board disk controller (the
electronic circuitry on the drive).
From there the data travels to the host
controller (a.k.a. the host bus adapter, a.k.a. the interface card) and finally to the host system (for example,
the computer with which the drive is
interfaced). When data is written to
the drive, it follows the reverse route.
When data is copied from a server to
a storage appliance (or vice versa), the
data must travel through an additional
physical layer, such as an Ethernet or
USB connection existing between the
server and the storage appliance.
Figure 1 is a simplified view of the
data flow when copying data to a storage appliance. The direction of data
flow shown in the figure is conceptually reversed when the data is copied
out from the storage appliance to the
cloud server.
Note that often the storage appliance may be nothing more than a
single hard drive, in which case the
data flow from the server to this drive
is basically along the dotted line in
the figure.
Given this data flow, a simple way
to express the time needed to transfer
the data to the cloud using the Ship
It! strategy is shown in Equation 1,
where: Vcontent is the volume of data to
be transferred in megabytes (MB).
SpeedcopyIn is the sustained rate
in MBps (megabytes per second) at
which data is copied from the source
drives to the storage appliance. This
speed is essentially the minimum
of three speeds: the speed at which
the controller reads data out of the
source drive and transfers it to the
host computer with which it inter-
faces; the speed at which the storage
appliance’s controller receives data
from its interfaced host and writes it
into the storage appliance; and the
speed of data transfer between the
two hosts. For example, if the two
hosts are connected over a Gigabit
Ethernet or a Fibre Channel connec-
tion, and the storage appliance is ca-
pable of writing data at 600MBps, but
if the source drive and its controller
can emit data at only 20MBps, then
the effective copy-in speed can be at
most 20MBps.
SpeedcopyOut is similarly the sus-
tained rate in MBps at which data is
copied out of the storage appliance
and written into cloud storage.
Ttransit is the transit time for the
shipment via the courier service from
source to destination in hours.
Toverhead is the overhead time in hours.
This can include the time required to
buy the storage devices (for example,
tapes), set them up for data transfer,
pack and create the shipment, and drop
it off at the shipper’s location. At the re-
ceiving end, it includes the time needed
to process the shipment received from
the shipper, store it temporarily, un-
pack it, and set it up for data transfer.
The use of sustained data-transfer
rates. Storage devices come in a variety
of types such as HDD, SSD, and LTO.
Each type is available in different con-
figurations such as a RAID (redundant
array of independent disks) of HDDs
or SSDs, or an HDD-SSD combination
where one or more SSDs are used as a
fast read-ahead cache for the HDD ar-
ray. There are also many different data-
transfer interfaces such as SCSI (Small
Computer System Interface), SATA
(Serial AT Attachment), SAS (Serial At-
tached SCSI), USB, PCI (Peripheral
Component Interconnect) Express,
Thunderbolt, and so on. Each of these
interfaces supports a different theo-
retical maximum data-transfer speed.
Figure 2 lists the data-transfer
speeds supported by a recent edition
of some of these controller interfaces.
The effective copy-in/copy-out
speed while copying data to/from a
storage appliance depends on a num-
ber of factors:
˲ Type of drive. For example, SSDs
are usually faster than HDDs partly
because of the absence of any mov-
Given the explosion
in the amount
of digital
information
that both
individuals
and enterprises
have to deal with,
the prospect
of moving big data
from one place
to another over
the Internet
is closer than
you might think.