Technology | DOI: 10.1145/1378727.1378733
Scalable and distributed video coding offers
the promise of two-way, real-time video.
LARISSA, A BRAzILIAN
foreign-language student studying in
Tokyo, gets a call on her cell
phone just as she arrives at
her apartment after classes.
She peers at the phone’s display and
sees her mother sitting in the living
room of the family’s home in São Paulo, plus a blinking blue dot indicating
the call is a live, two-way video stream.
Larissa flips open her phone.
“Mama, do you like my new haircut?” Larissa asks as she lets herself
into her apartment. “Is it too short?”
“No, it looks terrific,” says her mother. “I have some video of your father’s
birthday party. Please turn on your
“Okay,” replies Larissa, who points
her cell phone at the 50-inch, flat-panel
television on her living room wall and
pushes a button. The television flashes
awake, picks up the video stream from
the phone, and displays a high-quality
video of her family celebrating her father’s 49th birthday at his favorite restaurant in São Paulo.
One phone call, one stream of information. The cell phone takes only
the data it needs for its two-inch display while the 50-inch television monitor takes far more data for its greater
resolution—all from the same video
Welcome to the future world of scalable, distributed video.
Digital video coding compresses
the original data into fewer bits while
achieving a prescribed picture quality,
which it accomplishes largely by eliminating redundancies. Image data for a
static background object, for instance,
is stored just once, with subsequent
frames merely pointing back to the
original and registering only incremental changes.
Today’s video coding paradigm
exploits temporal and spatial redundancies—think of them together as
repetitive elements over time—with a
series of predictions, a set of represen-
tations, and a slew of cosine calculations. The goals are to remove the details the human eye can’t see (whether
they’re too fast, dark, or small), set
aesthetic rules (such as color and aspect ratio), tailor the bit and frame
rates for the highest picture quality at
the lowest file size, and save as much
bandwidth as possible.
A video stream is broken up into pictures that are not necessarily encoded
in the order in which they are played
back. Encoders append such commands as “for blocks 37–214, duplicate
the same blocks in the last frame,” and
quantize the transform coefficients to
control for the limitations of human
visual perception. Finally, entropy coding acts to control the statistical redundancy of the resulting coded symbols.
It’s not quite instant, but in fairly
short order video encoders produce a
digital video file, a fraction of its original size, for an iPod, laptop, or cell
phone. And with advances in scalable
and distributed video coding, two-way,
real-time video, such as Larissa’s conversation with her mother, is becoming a reality.
Hybrid coding, which leverages both
the temporal/predictive and frequency
domains, is the basis for most current
video standards. It does the hard work
at the encoding step, resulting in complex encoders but just basic decoders.
A downlink model of a few encoders serving many distributed decoders
serves applications for TV and cable
broadcasting and on-demand Web
video very well, but it makes decoder
complexity its focus. Today’s challenge, on the other hand, is the proliferation of wireless mobile devices—
from cell phones and Internet tablets
to laptops—that rely on up-links to deliver data. This requires capable device-based encoders.
In addition to robust encoding,
these emerging applications require
improved compression and increased