of content over the Internet using APIs
(application programming interfaces
available) from both the source and
destination cloud providers. 6 Let’s call
this strategy “Transfer It!”
This article compares these alter-
natives, with respect to time and cost,
to the baseline technique of upload-
ing the data to the cloud server using
an Internet connection. This base-
line technique is called “Upload It!”
for short.
A Real-Life Scenario
Suppose you want to upload your
content into, purely for the sake of
illustration, the Amazon S3 (Simple
Storage Service) cloud, specifically
its datacenter in Oregon. 2 This could
well be any other cloud-storage service provided by players9 in this space
such as (but not limited to) Microsoft,
Google, Rackspace, and IBM. Also,
let’s assume that your private datacenter is located in Kansas City, MO,
which happens to be approximately
geographically equidistant from Amazon’s datacenters2 located in the
eastern and western U.S.
Kansas City is also one of the few
places where a gigabit-speed optical-fiber service is available in the U.S. In
this case, it’s offered by Google Fiber. 7
As of November 2015, Google Fiber
offers one of the highest speeds that
an ISP can provide in the U.S.: 1Gbps
(gigabit per second), for both upload
and download. 13 Short of having access to a leased Gigabit Ethernet11 line,
an optical fiber-based Internet service
is a really, really fast way to shove bits
up and down Internet pipes anywhere
in the world.
Assuming an average sustained
upload speed of 800Mbps on such a
fiber-based connection, 13 (that is, 80%
of its advertised theoretical maximum
speed of 1Gbps), uploading 1TB of
data will require almost three hours
to upload from Kansas City to S3 stor-
age in Oregon. This is actually pretty
quick (assuming, of course, your con-
nection never slows down). Moreover,
as the size of the data increases, the
upload time increases in the same ra-
tio: 20TB requires 2. 5 days to upload,
50TB requires almost a week to up-
load, and 100TB requires twice that
long. At the other end of the scale, a
half a petabyte of data requires two
over the Internet is closer than you
might think.
To illustrate, let’s say you have 1 TB of
business data to migrate to cloud storage from your self-managed datacenter.
You are signed up with a business plan
with your ISP that guarantees you an upload speed of 50Mbps and a download
speed of 10 times as much. All you need
to do is announce a short system-down-time window and begin hauling your
data up to the cloud. Right?
Not quite.
For starters, you will need a whopping 47 hours to finish uploading 1TB
of data at a speed of 50Mbps—and
that’s assuming your connection never drops or slows down.
If you upgrade to a faster—say,
100Mbps—upload plan, you can fin-
ish the job in one day. But what if you
have 2 TB of content to upload, or
4TB, or 10TB? Even at a 100Mbps sus-
tained data-transfer rate, you will need
a mind-boggling 233 hours to move
10TB of content!
As you can see, conventional wisdom breaks down at terabyte and petabyte scales. It’s necessary to look at
alternative, nonobvious ways of dealing with data of this magnitude.
Here are two such alternatives available today for moving big data:
˲ Copy the data locally to a storage
appliance such as LTO (linear tape
open) tape, HDD (hard-disk drive), or
SSD (solid-state drive), and ship it to
the cloud provider. For convenience,
let’s call this strategy “Ship It!”
˲ Perform a cloud-to-cloud transfer
Figure 2. Data transfer speeds supported by various interfaces.
Interface Type Data Transfer Speed (Gbps)
SATA Revision 3 617
SAS- 3 1210
SuperSpeed USB (USB 3.0) 1020
PCI Express version 4 15.754 (single data lane), to 252.064 ( 16 data lanes) 14
Thunderbolt 2 201
Figure 1. Data flow when copying data to a storage appliance.
Source
Disk
Disk
Controller
HDD/SSD/LTO tape etc.
Optical Fiber/
Copper Cable/
Wireless
A directly
‘pluggable’
Drive
SATA/SAS/
PCI Express/
Thunderbolt etc.
Your Server
Storage Appliance
Host
Controller Host Host Controller
e.g.
USB/WiFi/SATA/
EthernetNIC/
Thunderbolt/PCI
Express-to-gigabit
Ethernet/PCI
Express-to-fiber channel
etc.
Source
Disk
Disk
Controller
HDD/SSD/LTO tape etc.
SATA/SAS/
PCI Express/
Thunderbolt etc.
Host
Controller Host Host Controller
e.g.
USB/WiFi/SATA/
EthernetNIC/
Thunderbolt/PCI
Express-to-gigabit
Ethernet/PCI
Express-to-fiber
channel etc.