High Energy and Nuclear Physics

 

Development and Use of MonALISA High Level Monitoring Services for the STAR Unified Meta-Scheduler
E. Efstathiadis, L. Hajdu, J. Lauret, and I. Legrand (CalTech)

As a Particle Physics Data Grid (PPDG) cross team-project we study, develop, implement and evaluate a set of tools that allow Meta-Schedulers to take advantage of a consistent set of shared information (such as information needed for complex decision making mechanisms) across both local and Grid Resource Management Systems. We demonstrate the usefulness of such tools within the MonALISA monitoring framework and the STAR Unified Meta-Scheduler.

We define the requirements and schema by which one can consistently provide queue attributes for the most common batch systems and evaluate the best scalable and lightweight approach to access the monitored parameters from a client perspective and, in particular, the feasibility of accessing real-time and aggregate information. Client programs are envisioned to function in a non-centralized, fault tolerant fashion. We believe that such developments could highly benefit Grid laboratory efforts such as the Grid3+ and the Open Science Grid (OSG).

The MonALISA (Monitoring Agents in A Large Integrated Services Architecture) system provides a distributed monitoring service. It is based on a scalable Dynamic Distributed Services Architecture (DDSA) that is implemented using JINI/JAVA and WSDL/SOAP technologies. The scalability of the system derives from the use of autonomous, multi-threaded station servers to host a variety of loosely coupled, self-describing, dynamic services, the ability of each service to register itself and then to be discovered and used by other services or clients that require such information, and the ability of all services and clients subscribing to a set of events (state changes) in the system to be notified automatically. The framework integrates several existing monitoring tools and procedures to collect parameters describing computational nodes, applications and network performance. It has built-in SNMP support and network-performance monitoring algorithms that enable it to monitor end-to-end network performance, as well as the performance and state of site facilities in a Grid.

The core of the MonALISA monitoring service is based on a multithreaded system (the monitoring service) used to perform the many data collection tasks in parallel, independently. It is designed to easily integrate existing monitoring tools and procedures and to provide this information in a dynamic, self-describing way to any other services or clients. MonALISA services are organized in groups and their group attribute is used for registration and discovery. Each service registers with a set of JINI Lookup Discovery Service (LUS), as a member of a group, and having a set of attributes. The LUSs are also JINI services and may be registered with other LUSs, resulting in a distributed and reliable network for registration of services. Services also provide the code base for the proxies that other services or clients will need to instantiate for using it.

A generic framework for building pseudo-clients for the MonALISA services was developed. This has been used for creating dedicated web service repositories with selected information from specific groups of monitoring services. The pseudo-clients use the same LUSs approach to find all the active MonALISA services from a specified set of groups and subscribe to these services with a list of predicates and filters. These predicates or filters specify the information the pseudo-client wants to collect from all the services. Pseudo-clients store received values from the running services in a local MySQL database. A Tomcat based servlet engine is used to provide a flexible way to present global data and to construct on the fly graphical charts for current or customized historical values, on demand. Multiple Web Repositories can easily be created to globally describe the services running in a distributed environment.

Queue monitoring data collected using custom MonALISA modules at each site are cached locally using pseudo-clients with fail-over capabilities. This way the monitoring data from all services that have joined a group are available to policies implemented into the Meta-Scheduler that choose the appropriate queue for the submitted job.

 

Click to enlarge image.

Figure 1.  The MonALISA monitoring service.

In addition, a MonALISA service has been deployed at BNL for the monitoring needs of the UltraLight project [4]. UltraLight is a collaboration of experimental physicists and network engineers whose purpose is to provide the network advances required to enable petabyte-scale analysis of globally distributed data. Current Grid-based infrastructures provide massive computing and storage resources, but are currently limited by their treatment of the network as an external, passive, and largely unmanaged resource. The goals of UltraLight are to:

  • Develop and deploy prototype global services which broaden existing Grid computing systems by promoting the network as an actively managed component.
  • Develop and deploy prototype global services which broaden existing Grid computing systems by promoting the network as an actively managed component.
  • Engineer and operate a trans- and intercontinental optical network testbed, including high-speed data caches and computing clusters, with U.S. nodes in California, Illinois, Florida, Michigan and Massachusetts, and overseas nodes in Europe, Asia and South America.

    MonALISA can be used to monitor and control network devices, such as routers and photonic switches. Since it gathers information system-wide, MonALISA is able to generate global views of the prevailing network connectivity, to identify network or end-system problems and act on them strategically, or locally as required. Services that take decisions based on these (global) system views can be created and deployed: for example, mobile agents that are able to provide optimized dynamic routing for distributed applications have recently been added to MonALISA.

     

    Click to enlarge image.

    Figure 2.  The UltraLight testbed includes sites in the Americas, Europe and Asia.

References

 



 

Top of Page

Last Modified: January 31, 2008
Please forward all questions about this site to: Claire Lamberti