General Lab Information

Computation and Data-Driven Discovery

RECUP: Scalable Metadata and Provenance Services for Reproducible Hybrid Workflows

The ability to reproduce the results of scientific workflows is a key enabler of scientific discovery. The increasing complexity and integration of artificial intelligence/machine learning (AI/ML) into scientific workflows at extreme scale makes replication of results incredibly challenging. The inability to replicate these hybrid workflows presents a major obstacle in the scientific discovery process as it impairs researchers’ ability to validate and trust their results and inhibits the uptake of their research outcomes by others. Findable, Accessible, Interoperable, and Reusable (FAIR) data and workflows can be vital enablers in many aspects of scientific discovery, while ensuring the FAIRness of (meta)data (i.e., data and metadata) can reduce barriers to reproducibility by making this information easier to find and interpret, programmatically access, and reuse in new contexts.

The RECUP framework for reproducibility, showing data sources, repository saving intermediate results, and user analysis of performance and result reproducibility.

This project is an important first step in making high-performance computing (HPC) and AI-enabled workflow applications easily reproducible. Reproducibility, a core tenant of scientific discovery, will support the widespread use of AI applications and workflows at scale in the broader science community, engendering trust in their results. The application of FAIR principles will support reproducibility in the context of scientific workflow while also encouraging and enabling the accessibility of these results by researchers, including those from underserved and minority communities. Successful approaches demonstrated in this work will pave the way for production-grade tools, providing these capabilities for the broad HPC community.

Our multifaceted approach includes a novel data management system for capturing, fusing, storing, and organizing the rich and multi-modal information necessary for reproducibility of hybrid workflows at scale. We are developing schemas describing the resulting (meta)data, enabling convenient access. We will demonstrate using this information in conjunction with a capable workflow management system to allow the reproducibility of hybrid workflows that are representative of DOE science activities. Finally, we will demonstrate tools for comparative analysis of workflow executions to isolate where execution deviated in terms of performance and/or results.