Article development led by
queue.acm.org
MonALISA developers describe how it works,
the key design principles behind it, and the
biggest technical challenges in building it.
BY ioSif LeGRaND, RamiRo Voicu, caTaLiN ciRSToiu,
coSTiN GRiGoRaS, LaTchezaR Be TeV, aND aLexaNDRu coSTaN
monitoring
and control
of Large
Systems with
monaLiSa
tHE HigH EnErgy physics (HEP) group at California
Institute of Technology started developing the
MonALISA (Monitoring Agents using a Large
Integrated Services Architecture) framework in 2002,
aiming to provide a distributed service system
capable of controlling and optimizing large-scale, data-intensive applications. 11 Its initial target field of applications is the grid systems and the
networks supporting data processing
and analysis for HEP collaborations.
Our strategy in trying to satisfy the demands of data-intensive applications
was to move to more synergetic relationships between the applications,
computing, and storage facilities and
the network infrastructure.
An essential part of managing
large-scale, distributed data-process-ing facilities is a monitoring system
for computing facilities, storage, net-
works, and the very large number of
applications running on these systems in near real time. The monitoring information gathered for all the
subsystems is essential for developing the required higher-level services—the components that provide
decision support and some degree of
automated decisions—and for maintaining and optimizing workflow in
large-scale distributed systems. These
management and global optimization
functions are performed by higher-level agent-based services. Current applications of MonALISA’s higher-level
services include optimized dynamic