The following news release was issued by the U.S. Department of Energy. It announces funding for nine projects that will address management and processing of massive data sets produced by scientific observatories, experimental facilities, and supercomputers that span the DOE national laboratory complex. As part of this program, Brookhaven Lab was awarded $2.4 million in funding over three years. Byung-Jun Yoon of Brookhaven’s Computational Science Initiative will lead the project with university partners. Their work will involve using novel theoretical strategies to develop practical algorithms anchored in scientific objectives, i.e., focused specifically on the goals of interest. These algorithms would bypass scientifically irrelevant information, which could considerably reduce overall data generated by complex experimental or computational systems and streamline any required processing.

DOE Invests Millions for Research in Data Reduction for Science

Research seeks to tame massive data sets to advance scientific discovery

Photo of DOE logo and data background

WASHINGTON, D.C.—Today, the U.S. Department of Energy (DOE) announced $13.7 million in funding for nine research projects that will advance the state of the art in computer science and applied mathematics. The projects – led by five universities and five DOE National Laboratories across eight states – will address the challenges of moving, storing, and processing the massive data sets produced by scientific observatories, experimental facilities, and supercomputers, accelerating the pace of scientific discoveries.

As scientific user facilities upgrade and expand, their capacity for generating unwieldy amounts of scientific data has started to exceed scientists’ abilities to stream, archive, and analyze that data. This has created an urgent need to develop new mathematical and computer-science techniques to shrink these data sets by removing trivial or repetitive data while preserving the important scientific information that can lead to discovery.

While the need for data reduction techniques is clear, the scientists using those techniques must trust that they are not losing important scientific information, and this presents a key challenge. Research supported by this program must address not only the efficiency and effectiveness of a data reduction technique, but its trustworthiness as well.

“Scientific user facilities across the nation, including the DOE Office of Science, are producing data that could lead to exciting and important scientific discoveries, but the size of that data is creating new challenges,” said Barb Helland, Associate Director for Advanced Scientific Computing Research, DOE Office of Science. “Those discoveries can only be uncovered if the data is made manageable, and the techniques employed to do that are trusted by the scientists.”

Projects selected in today’s announcement cover a wide range of topics that promise important innovations in data-reduction techniques, including techniques using advanced machine learning, large-scale statistical calculations, and novel hardware accelerators. A sample of the projects includes:

  • Methods to compress streaming data: Researchers at Oak Ridge National Laboratory  will develop techniques to compress data coming directly from a scientific instrument or a computer model by taking advantage of its specific structure and integrating advanced machine-learning techniques, while allowing scientists to control certain features of the data.
  • Methods to intelligently select and tune compression techniques: Researchers at Texas State University will develop techniques to search the vast space of potential data compression techniques and select the best method based on the user’s requirements for fidelity, speed, and memory usage.
  • Compression methods for related groups of data sets: Researchers at the University of California, San Diego will develop scalable techniques for compressing multiple related streams of data, such as those from multiple sensors observing the same physical system, by taking advantage of the relationships between the data sets.
  • Methods for programming custom hardware accelerators for streaming compression: Researchers at Fermi National Accelerator Laboratory will develop techniques for encoding advanced compression and filtering, including those based on machine learning methods, as custom hardware accelerators for use in a wide array of experimental settings, from particle physics experiments to electron microscopes.

The projects are managed by the Advanced Scientific Computing Research (ASCR) program within the DOE Office of Science.

The full list of projects and more information can be found here.

2021-19106  |  INT/EXT  |  Newsroom