New International Consortium Formed to Create Trustworthy and Reliable Generative AI Models for Science

Because you are not running JavaScript or allowing active scripting, some features on this page my not work. >> Enable Javascript <<

Electron-Ion Collider

Support Orgs Dept. Codes

Initiative brings together teams of researchers engaged in creating large-scale generative AI models to address key challenges in advancing AI for science

November 28, 2023

Image of large language models in computer simulation with a human hand pointing.

(Image by Shutterstock.)

The following news release was issued on Nov. 10, 2023 by the U.S. Department of Energy’s Argonne National Laboratory, announcing the new international Trillion Parameter Consortium, or TPC, which aims to build large-scale artificial intelligence (AI) systems that will advance its applications in global scientific research. Brookhaven National Laboratory is among the founding members of the multinational AI- and team-science-centric consortium, which spans laboratories, research and academic institutions, and industry. Shantenu Jha, with Brookhaven Lab’s Computational Science Initiative (CSI), will co-lead the TPC’s Working Group on Model-Pretraining-Runtime-Compute-Performance. The working group will benefit from CSI’s existing and expanding research involving AI and advanced architectures, high-performance workflow and data integration, system characterization, codesign, and modeling and simulation that aligns with the TPC’s foundational work areas, namely designing and evaluating model architectures, performance, training, and downstream applications. Shinjae Yoo, CSI’s Machine Learning Group lead, and Lav Varshney, a joint appointee with CSI’s Computing for National Security group, also are part of TPC. Through TPC, Yoo is partnering with Princeton Plasma Physics Laboratory on a fusion foundation model, while Varshney is working on questions of safety and governance toward Responsible AI. CSI, which has long focused on research toward AI solutions that are robust, reproducible, and explainable, also will support TPC-related Responsible AI. For more information about Brookhaven’s role in the TPC, contact: Charity Plata, cplata@bnl.gov, (631) 344-6152.

A global consortium of scientists from federal laboratories, research institutes, academia, and industry has formed to address the challenges of building large-scale artificial intelligence (AI) systems and advancing trustworthy and reliable AI for scientific discovery.

The Trillion Parameter Consortium (TPC) brings together teams of researchers engaged in creating large-scale generative AI models to address key challenges in advancing AI for science. These challenges include developing scalable model architectures and training strategies, organizing, and curating scientific data for training models; optimizing AI libraries for current and future exascale computing platforms; and developing deep evaluation platforms to assess progress on scientific task learning and reliability and trust.

Toward these ends, TPC will:

Build an open community of researchers interested in creating state-of-the-art large-scale generative AI models aimed broadly at advancing progress on scientific and engineering problems by sharing methods, approaches, tools, insights, and workflows.
Incubate, launch, and coordinate projects voluntarily to avoid duplication of effort and to maximize the impact of the projects in the broader AI and scientific community.
Create a global network of resources and expertise to facilitate the next generation of AI and bring together researchers interested in developing and using large-scale AI for science and engineering.

The consortium has formed a dynamic set of foundational work areas addressing three facets of the complexities of building large-scale AI models:

Identifying and preparing high-quality training data, with teams organized around the unique complexities of various scientific domains and data sources.
Designing and evaluating model architectures, performance, training, and downstream applications.
Developing crosscutting and foundational capabilities such as innovations in model evaluation strategies with respect to bias, trustworthiness, and goal alignment, among others.

TPC aims to provide the community with a venue in which multiple large model-building initiatives can collaborate to leverage global efforts, with flexibility to accommodate the diverse goals of individual initiatives. TPC includes teams that are undertaking initiatives to leverage emerging exascale computing platforms to train LLMs — or alternative model architectures — on scientific research including papers, scientific codes, and observational and experimental data to advance innovation and discoveries.

Trillion parameter models represent the frontier of large-scale AI with only the largest commercial AI systems currently approaching this scale.

Training LLMs with this many parameters requires exascale class computing resources, such as those being deployed at several U.S. Department of Energy (DOE) national laboratories and multiple TPC founding partners in Japan, Europe, and elsewhere. Even with such resources, training a state-of-the-art one trillion parameter model will require months of dedicated time—intractable on all but the largest systems. Consequently, such efforts will involve large, multi-disciplinary, multi-institutional teams. TPC is envisioned as a vehicle to support collaboration and cooperative efforts among and within such teams.

Group photo of researchers and scientists

Founding partners of the TPC gathered for a kickoff meeting to begin discussing ways to work together to develop generative AI models for scientific discovery. Not pictured are more than 130 remote participants from around the world. (Image by Argonne National Laboratory.)

“At our laboratory and at a growing number of partner institutions around the world, teams are beginning to develop frontier AI models for scientific use and are preparing enormous collections of previously untapped scientific data for training,” said Rick Stevens, associate laboratory director of computing, environment and life sciences at DOE’s Argonne National Laboratory and professor of computer science at the University of Chicago. “We collaboratively created TPC to accelerate these initiatives and to rapidly create the knowledge and tools necessary for creating AI models with the ability to not only answer domain-specific questions but to synthesize knowledge across scientific disciplines.”

The founding partners of TPC are from the following organizations (listed in organizational alphabetical order, with a point-of-contact):

AI Singapore: Leslie Teo
Allen Institute For AI: Noah Smith
AMD: Michael Schulte
Argonne National Laboratory: Ian Foster
Barcelona Supercomputing Center: Mateo Valero Cortes
Brookhaven National Laboratory: Shantenu Jha
CalTech: Anima Anandkumar
CEA: Christoph Calvin
Cerebras Systems: Andy Hock
CINECA: Laura Morselli
CSC - IT Center for Science: Per Öster
CSIRO: Aaron Quigley
ETH Zürich: Torsten Hoefler
Fermilab National Accelerator Laboratory: Jim Amundson
Flinders University: Rob Edwards
Fujitsu Limited: Koichi Shirahata
HPE: Nic Dube
Intel: Koichi Yamada
Juelich Supercomputing Center: Thomas Lippert
Kotoba Technologies, Inc.: Jungo Kasai
LAION: Jenia Jitsev
Lawrence Berkeley National Laboratory: Stefan Wild
Lawrence Livermore National Laboratory: Brian Van Essen
Leibniz Supercomputing Centre: Dieter Kranzlmüller
Los Alamos National Laboratory: Jason Pruet
Microsoft: Shuaiwen Leon Song
National Center for Supercomputing Applications: Bill Gropp
National Institute of Advanced Industrial Science and Technology (AIST): Yoshio Tanaka
National Renewable Energy Laboratory: Juliane Mueller
National Supercomputing Centre, Singapore: Tin Wee Tan
NCI Australia: Jingbo Wang
New Zealand eScience Infrastructure: Nick Jones
Northwestern University: Pete Beckman
NVIDIA: Giri Chukkapalli
Oak Ridge National Laboratory: Prasanna Balaprakash
Pacific Northwest National Laboratory: Neeraj Kumar
Pawsey Institute: Mark Stickells
Princeton Plasma Physics Laboratory: William Tang
RIKEN: Makoto Taiji
Rutgers University: Shantenu Jha
SambaNova: Marshall Choy
Sandia National Laboratories: John Feddema
Seoul National University: Jiook Cha
SLAC National Accelerator Laboratory: Daniel Ratner
Stanford University: Sanmi Koyejo
STFC Rutherford Appleton Laboratory, UKRI: Jeyan Thiyagalingam
Texas Advanced Computing Center: Dan Stanzione
Thomas Jefferson National Accelerator Facility: Malachi Schram
Together AI: Ce Zhang
Tokyo Institute of Technology: Rio Yokota
Université de Montréal: Irina Rish
University of Chicago: Rick Stevens
University of Delaware: Ilya Safro
University of Illinois Chicago: Michael Papka
University of Illinois Urbana-Champaign: Lav Varshney
University of New South Wales: Tong Xie
University of Tokyo: Kengo Nakajima
University of Utah: Manish Parashar
University of Virginia: Geoffrey Fox

TPC contact: Charlie Catlett

Learn more at tpc.dev.

The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.

Tags: artificial intelligence computing CSI DOE high-performance computing partnerships

2023-21561 | INT/EXT | Newsroom