a distributed services registry, and
the discovery and notification mechanisms, the services are able to access
each other seamlessly. The use of dynamic remote event subscription allows a service to register an interest
in a selected set of event types, even in
the absence of a notification provider
at registration time.
Proxy services make up the third
layer of the MonALISA framework.
They provide an intelligent multiplexing of the information requested
by the clients or other services and
are used for reliable communication
among agents. This layer can also be
used for access-control enforcement
to provide secure access to the collected information and the remote
services management.
Higher-level services and clients access the collected information using
the proxies’ layer. A location-aware,
load-balancing mechanism is used to
allocate these services dynamically to
the best proxy service. The clients, other services, or agents can get real time
or historical data by using a predicate
mechanism for requesting or subscribing to selected measured values.
These predicates are based on regular
expressions to match the attribute description of the measured values that a
client is interested in. They may also be
used to impose additional conditions
or constraints for selecting the values.
The subscription requests create a
dedicated priority queue for messages.
The communication with the clients
is served by a pool of threads. The allocated thread performs the matching
tests for all the predicates submitted
by a client with the monitoring values
in the data flow. The same thread is responsible for sending the selected results back to the client as compressed
serialized objects.
Having independent threads for clients allows sending the information
they need in a fast and reliable way,
avoiding the interference caused by
communication errors that may occur
with other clients. In case of communication problems, these threads will try
to reestablish the connection or clean
up the subscriptions for a client or service that is no longer active.
communication Lessons
One of the most difficult parts in de-
our strategy in
trying to satisfy
the demands of
data-intensive
applications
was to move to
more synergetic
relationships
between the
applications,
computing, and
storage facilities
and the network
infrastructure.
veloping the MonALISA system was
the communication mechanism for
all these services in the wide area network. The system tries to establish
and maintain reliable communication among services, using the ability to reconnect automatically or find
alternative services in case of network
or hardware problems. Although, the
fashion of the time was to implement
remote call protocols over XML and to
use Web services, we decided to use a
binary protocol especially to avoid the
overhead of wrapping everything in
a text-based protocol and because of
the lack of remote notification except
for a pull-based approach (the Oasis
Web Services Notification14 appeared
later and still used a pull-based approach in the first implementations).
Although XML or Web services still
make perfect sense for certain applications, they are not appropriate for
large dynamic data.
Initially we used the Java RMI (
remote method invocation) as the communication protocol between clients
and services. It was an elegant solution and helped us in the beginning
to develop the other components of
the framework without focusing too
much on the underlying communication protocol. As soon as we started
deploying the monitoring service on
more and more sites, however, we
had to replace this approach for two
main reasons. The first was security
concerns for the computing centers
within HEP and the difficulty opening
ports in the firewalls of those centers
for incoming TCP connections. In
some cases even the outgoing connectivity had to be restricted to a few
IP addresses and ports. This was in
fact the main reason for developing
the layer of proxy services, allowing all
the other MonALISA services to communicate with each other even when
running behind firewalls or local NAT
(network address translation) environments.
The second reason we had to replace RMI was because of its relatively
low performance and stability in WAN
connections (see Figure 2). The main
operating system used in the HEP
community was and still is Linux,
but different flavors of it—kernels
and libraries—and, of course, under
a heterogeneous administration. Java