helped a lot, but we have experienced
stalled sockets and poor network
throughput because of the TCP stack
implementation in the 2. 4 kernels
used at the time.
We tried to find the best balance
between performance and time spent
developing a custom protocol, so we
still used the native Java serialization.
Because of the initial aim to react in
almost real time, we had to develop
our keep-alive mechanism at the application level; we could not control
and also had problems with the one
at the kernel level. Implementing our
own communication protocol over
standard TCP sockets helped us to
have a finer control in case of network
I/O errors for quick and clean recovery. Although the TCP implementation5 has changed in the latest 2. 6
kernels—and even though the default
congestion protocol works reasonably
well without any special settings—we
still believe that, depending on the
time constraints for the application,
any remote call protocol will be an issue in WAN environments because of
the intrinsic overhead combined with
network latency.
At the other end, for the LAN communication between thousands of
monitored entities and the local
MonALISA service, we decided to take
another approach: use a UDP (User
Datagram Protocol)-based binary but
highly portable protocol employing
external data representation (XDR) 15
for data encoding. This choice proved
figure 2. monitoring the quality of WaN links.
to be effective and allowed the service
to collect more than 5,000 messages
per second without any loss—TCP
would not have scaled to receive data
simultaneously from all the nodes in
a large computing farm. The ApMon
client library (available in Java, C, Perl,
and Python) that we developed for this
purpose became the preferred way of
tracing remote jobs and nodes, as it
could send not only user-specific data,
but also process and machine monitoring information.
figure 3. The monaLiSa repository for aLice. Lines represent
site relations (Tier0-Tier1-Tier2).
challenges of a Large,
Data-intensive Scientific Project
One of the largest communities using
the MonALISA system is ALICE (A Large
Ion Collider Experiment), 1 one of four
LHC (Large Hadron Collider) experiments at CERN (European Organization for Nuclear Research). 4 The ALICE
collaboration, consisting of more than
1,000 members from 29 countries and
86 institutes, is strongly dependent on
a distributed computing environment
to perform its physics program. The
ALICE experiment will start running
this year and will collect data at a rate
of up to four petabytes per year. During
its design lifetime of 20 years, ALICE
will produce more than 109 data files
per year, and require tens of thousands
of CPUs to process and analyze them.
The CPU and storage capacities are distributed over more than 80 computing
centers worldwide. These resources
are heterogeneous in all aspects, from
CPU model and count to operating system and batch queuing software. The
allocated resources should increase
over time to match the increase in the
data-acquisition rate resulting from
changes in experiment parameters, so
that a doubling is foreseen in two years,
and so on.
The ALICE computing model requires a dedicated node in each com-