that capture page popularity trends to
sophisticated time-series methods that
describe access patterns across multiple user sessions. These insights inform
marketing initiatives, content hosting,
and resource provisioning.
A variety of statistical techniques
have been used for profiling and reporting on log data. Clustering algorithms such as k-means and hierarchical clustering group similar events.
Markov chains have been used for pattern mining where temporal ordering
is essential.
Many profiling and alerting techniques require hints in the form of
expert knowledge. For example, the k-
means clustering algorithm requires
the user either to specify the number of
clusters (k) or to provide example events
that serve as seed cluster centers. Other
techniques require heuristics for merging or partitioning clusters. Most techniques rely on mathematical representations of events, and the results of the
analysis are presented in similar terms.
It may then be necessary to map these
mathematical representations back
into the original domain, though this
can be difficult without understanding
the log semantics.
Classifying log events is often
challenging. To categorize system
performance, for example, you may
profile CPU utilization and memory
consumption. Suppose you have a performance profile for high CPU utilization and low memory consumption,
and a separate profile of events with
low CPU utilization and high memory
consumption; when an event arrives
containing low CPU utilization and
low memory consumption, it is unclear to which of the two profiles (or
both) it should belong. If there are
enough such events, the best choice
might be to include a third profile.
There is no universally applicable rule
for how to handle events that straddle
multiple profiles or how to create such
profiles in the first place.
Although effective for grouping
similar events and providing high-level views of system behavior, profiles do not translate directly to operationally actionable insights. The
task of interpreting a profile and
using it to make business decisions, to
modify the system, or even to modify the
analysis, usually falls to a human.
Profiles do not
translate directly
to operationally
actionable
insights. The task
of interpreting
a profile and using
it to make business
decisions, to modify
the system,
or even to modify
the analysis,
usually falls
to a human.
Logging infrastructures
A logging infrastructure is essential for
supporting the variety of applications
described here. It requires at least two
features: log generation and log storage.
Most general-purpose logs are unstructured text. Developers use printf and
string concatenations to generate messages because these primitives are well
understood and ubiquitous. This kind
of logging has drawbacks, however.
First, serializing variables into text is expensive (almost 80% of the total cost of
printing a message). Second, the analysis needs to parse the text message,
which may be complicated and expensive.
On the storage side, infrastructures
such as syslog aggregate messages
from network sources. Splunk indexes
unstructured text logs from syslog and
other sources, and it performs both
real time and historical analytics on the
data. Chukwa archives data using Ha-doop to take advantage of distributed
computing infrastructure. 11
Choosing the right log-storage solution involves the following trade-offs:
˲ ˲Cost per terabyte (upfront and
maintenance)
˲ ˲ Total capacity
˲ ˲ Persistence guarantees
˲ ˲ Write access characteristics (for example, bandwidth and latency)
˲ ˲Read access characteristics (
random access vs. sequential scan)
˲ ˲Security considerations (access
control and regulation compliance)
˲ ˲Integration with existing infra-
structure
There is no one-size-fits-all policy for
log retention. This makes choosing and
configuring log solutions a challenge.
Logs that are useful for business intel-
ligence are typically considered more
important than debugging logs and
thus are kept for a longer time. In con-
trast, most debug logs are stored for as
long as possible but without any reten-
tion guarantee, meaning they may be
deleted under resource pressure.
Log-storage solutions are more useful when coupled with alerting and
reporting capabilities. Such infrastructures can be leveraged for debugging,
security, and other system-manage-ment tasks. Various log-storage solutions facilitate alerting and reporting,
but they leave many open challenges
pertaining to alert throttling, report acceleration, and forecasting capabilities.