DETECTING STEGANOGRAPHY
ON A LARGE SCALE
by William Ella
Introduction
If asked, most people today would not recognize the word steganography. Even within the academic
and professional societies of computer science, many are unfamiliar with the term. Likewise,
“steganography” is not even included within most spell-checking programs [ 1]. How could such
an emerging field in computer science remain hidden for so long?
Steganography is the science of hiding information; this is often
confused with the related science of cryptography, which is similar
to a virtual “padlock” that openly challenges an eavesdropper to
break into a message. Unlike cryptography, steganography challenges
an eavesdropper to notice the presence of a message. In practical
terms, this means that steganography can be anywhere; there is
potential for a hidden message wherever information lies. The earliest example of steganography references the time of the Romans. A
general shaved the head of a trusted slave, tattooed a message on top
of his head, and then sent the slave across the country after his hair
grew back. When the slave reached his destination unopposed, he
shaved his head to reveal the hidden message [ 5]. Other examples of
classic steganography involve the usage of invisible ink and
microdots in World War II [ 2]. In this article, I will first establish the
foundations of information hiding, then describe some elementary
methods used in steganography and steganalysis, and finally explain
the results of a research experiment I performed in order to understand how to undertake large-scale steganalysis.
The Methodology of Information Hiding
The two basic types of modern information hiding are steganography
and watermarking. In general, the goal of watermarking is to put a
secure stamp on a transmitted file; this is often seen in copyrighted multimedia such as sample pictures, online videos, and other types of intellectual property. The presence of the watermark usually does not need
to be hidden, as long as it cannot be removed without destroying the
original file. The practice of watermarking is similar to cryptography.
Steganography, on the other hand, rests on the complete undetectabil-ity of a message. Unlike watermarking, the host file is often unimportant for steganographic usage, so it uses an innocent file as a cover for
sending hidden data. However, if a message is even expected within this
file, then the integrity of that usage of steganography has been broken,
regardless of whether or not the message can be decoded [ 6].
There are three basic tenets behind hiding information. The first
is capacity, which is the amount of information that can be embedded within the cover file. An information hiding algorithm has to be
able to compactly store a message within a file; it does no good to
have a message that cannot be found but is also severely hampered
by its size. Next is security, which refers to how easily a third-party
can detect hidden information within a file. Intuitively, if a message
is to be hidden, an ideal algorithm would store information in a way
that was very hard to notice. Finally, robustness, the amount of mod-
ification a message can withstand before being destroyed by a third-party. In regards to both steganography and watermarking, steganography is more concerned with capacity and security, and
watermarking mainly focuses on robustness [ 5].
Different Media and Techniques
With modern techniques in steganography, hidden digital messages
have the potential to be anywhere. Information can be embedded
into many different types of digital files. Prominent examples are
images, music, executables, and word-processing documents.
Although the techniques may differ from file to file, the main principle of steganography stays the same. The idea is to hide information
within the redundant data of a file. With this in mind, even html
code, network traffic, and fax headers can be media for carrying
secret messages [ 1].
The image is the most popular type of file for hiding information
by far; its presence on the Internet is quite ubiquitous. Because of the
image’s prominence, most research is focused on hiding information
within different file types such as BMP, GIF, and JPEG. Most information hiding for images include one of the following four techniques: modifying the least-significant bits of a file, masking the data
in noise, scattering the data throughout the file, or embedding data
with the compression algorithms. The least-significant bit modification finds all of the least important color data in an image and
replaces it with the hidden message. In more sophisticated algorithms, bits are selected based on the characteristics of human vision,
making it practically impossible to notice any visual imperfections in
the image. Masking data in the noise distributes hidden data
throughout the image file. This can be achieved in many ways; one
interesting example is the “patchwork method,” in which pairs or
patches of pixels are randomly selected. The first half of the pair has
its pixel values lowered by a slight constant, while the latter half is
raised by the same amount. Scattering data throughout the file, also
known as spread spectrum techniques, use transformation algorithms to evenly spread message data throughout an entire cover file.
Finally, compression algorithms embed data by manipulating the
techniques used to make an image file smaller in size [ 6]. All of these
techniques can be generalized to other file types as well [ 7].
Detecting Hidden Information
Though information can be soundly hidden within innocent files, every
technique leaves an inadvertent trail in the cover file. Steganalysis is