|
|
High Energy and Nuclear Physics
Development and Use of MonALISA High Level Monitoring Services for
the STAR Unified Meta-Scheduler
E. Efstathiadis, L. Hajdu, J. Lauret, and I. Legrand (CalTech)
As a Particle Physics Data Grid (PPDG) cross team-project we study,
develop, implement and evaluate a set of tools that allow
Meta-Schedulers to take advantage of a consistent set of shared
information (such as information needed for complex decision making
mechanisms) across both local and Grid Resource Management Systems. We
demonstrate the usefulness of such tools within the MonALISA monitoring
framework and the STAR Unified Meta-Scheduler.
We define the requirements and schema by which one can consistently
provide queue attributes for the most common batch systems and evaluate
the best scalable and lightweight approach to access the monitored
parameters from a client perspective and, in particular, the feasibility
of accessing real-time and aggregate information. Client programs are
envisioned to function in a non-centralized, fault tolerant fashion. We
believe that such developments could highly benefit Grid laboratory
efforts such as the Grid3+ and the Open Science Grid (OSG).
The MonALISA (Monitoring Agents in A Large Integrated Services
Architecture) system provides a distributed monitoring service. It is
based on a scalable Dynamic Distributed Services Architecture (DDSA)
that is implemented using JINI/JAVA and WSDL/SOAP technologies. The
scalability of the system derives from the use of autonomous,
multi-threaded station servers to host a variety of loosely coupled,
self-describing, dynamic services, the ability of each service to
register itself and then to be discovered and used by other services or
clients that require such information, and the ability of all services
and clients subscribing to a set of events (state changes) in the system
to be notified automatically. The framework integrates several existing
monitoring tools and procedures to collect parameters describing
computational nodes, applications and network performance. It has
built-in SNMP support and network-performance monitoring algorithms that
enable it to monitor end-to-end network performance, as well as the
performance and state of site facilities in a Grid.
The core of the MonALISA monitoring service is based on a multithreaded
system (the monitoring service) used to perform the many data collection
tasks in parallel, independently. It is designed to easily integrate
existing monitoring tools and procedures and to provide this information
in a dynamic, self-describing way to any other services or clients.
MonALISA services are organized in groups and their group attribute is
used for registration and discovery. Each service registers with a set
of JINI Lookup Discovery Service (LUS), as a member of a group, and
having a set of attributes. The LUSs are also JINI services and may be
registered with other LUSs, resulting in a distributed and reliable
network for registration of services. Services also provide the code
base for the proxies that other services or clients will need to
instantiate for using it.
A generic framework for building pseudo-clients for the MonALISA
services was developed. This has been used for creating dedicated web
service repositories with selected information from specific groups of
monitoring services. The pseudo-clients use the same LUSs approach to
find all the active MonALISA services from a specified set of groups and
subscribe to these services with a list of predicates and filters. These
predicates or filters specify the information the pseudo-client wants to
collect from all the services. Pseudo-clients store received values from
the running services in a local MySQL database. A Tomcat based servlet
engine is used to provide a flexible way to present global data and to
construct on the fly graphical charts for current or customized
historical values, on demand. Multiple Web Repositories can easily be
created to globally describe the services running in a distributed
environment.
Queue monitoring data collected using custom MonALISA modules at each
site are cached locally using pseudo-clients with fail-over
capabilities. This way the monitoring data from all services that have
joined a group are available to policies implemented into the
Meta-Scheduler that choose the appropriate queue for the submitted job.
|
 |
|
Figure 1. The MonALISA monitoring service. |
In addition, a MonALISA service has been deployed at BNL for the
monitoring needs of the UltraLight project [4]. UltraLight is a
collaboration of experimental physicists and network engineers whose purpose
is to provide the network advances required to enable petabyte-scale
analysis of globally distributed data. Current Grid-based infrastructures
provide massive computing and storage resources, but are currently limited
by their treatment of the network as an external, passive, and largely
unmanaged resource. The goals of UltraLight are to:
- Develop and deploy prototype global services which broaden existing
Grid computing systems by promoting the network as an actively managed
component.
- Develop and deploy prototype global services which broaden existing
Grid computing systems by promoting the network as an actively managed
component.
- Engineer and operate a trans- and intercontinental optical network
testbed, including high-speed data caches and computing clusters, with
U.S. nodes in California, Illinois, Florida, Michigan and Massachusetts,
and overseas nodes in Europe, Asia and South America.
MonALISA can be used to monitor and control network devices, such as
routers and photonic switches. Since it gathers information system-wide,
MonALISA is able to generate global views of the prevailing network
connectivity, to identify network or end-system problems and act on them
strategically, or locally as required. Services that take decisions
based on these (global) system views can be created and deployed: for
example, mobile agents that are able to provide optimized dynamic
routing for distributed applications have recently been added to
MonALISA.
|
 |
|
Figure 2. The UltraLight testbed
includes sites in the Americas, Europe and Asia. |
References
-
[1] Efstathiadis, E. et al. Development and use of MonALISA high-level
monitoring services for meta-schedulers. Computing in High Energy and
Nuclear Physics (CHEP ‘04), Interlaken, Switzerland, Sept. 27 - Oct. 1,
2004.
-
[2]
http://www.ivdgl.org/documents/document_server/uploaded_documents/doc--998--iVDGL_Star_monitoring.ppt
-
[3] http://indico.cern.ch/getFile.py/access?contribId=393&sessionId=7&resId=0&materialId=
slides&confId=0
- [4]
http://ultralight.caltech.edu/web-site/ultralight/html/index.html
- [5] Efstathiadis, E., et al. Status of the QCDOC project at BNL.
International Workshop on QCDOC and BlueGene, Edinburgh, UK, October
4-6, 2005.
- Bennett, G.W., et al. Final report of the E821 muon anomalous
magnetic moment measurement at BNL. Phys. Rev. D. 73 (2006).

Last Modified: January 31, 2008 Please forward all questions about this site to:
Claire Lamberti
|