General Lab Information

Computation and Data-Driven Discovery

ExaWorks

Modern scientific discovery increasingly relies on workflow applications with unrelated tasks that need to execute on heterogeneous resources at unprecedented, ever-increasing scale. Executing those applications on high-performance computing (HPC) platforms presents many new challenges. ExaWorks brings together a multi-lab team within the DOE HPC software ecosystem to promote integration of existing middleware tools and innovation within their aggregated boundaries. ExaWorks’ Brookhaven Lab team contributes to the project’s two main deliverables: Portable Submission Interface for Jobs (PSI/J) and Software Development Kit (SDK).

Portable Submission Interface for Jobs

PSI/J is a Python abstraction layer over cluster schedulers. It exposes a unified application programming interface (API) to enable HPC applications to run on the majority of DOE and National Science Foundation HPC clusters. PSI/J automatically translates abstract job specifications into concrete scripts and commands to send to the scheduler. Based on a well-defined API, PSI/J is tested on a wide variety of clusters, runs entirely in user space, and uses built-in or community contributed plugins [1].

Software Development Kit

SDK facilitates access to increasingly hardened, scalable, and portable workflow technologies. It provides packaging, testing, and tutorials for a set of core components. Currently, those components are RADICAL-Cybertools, Parsl, Flux, and Swift/T. However, any group can apply to add their middleware to SDK. Further, SDK marshals the integration among the core components (e.g., Parsl+RADICAL-Cybertools [2], RADICAL-Cybertools+Flux) to offer added functionalities while avoiding the “reinvent-the-wheel” cycle in an already saturated ecosystem of workflow management software.

Publications

[1] https://arxiv.org/abs/2307.07895
[2] https://ieeexplore.ieee.org/abstract/document/10023932/