General Lab Information

Machine Learning Group

Enabling scientific breakthroughs through the development of novel and scalable machine learning algorithms.

CSI’s Machine Learning Group (MLG) focuses on advancing machine learning algorithms and their scientific applications. Our mission is to enable scientific breakthroughs by developing novel and scalable machine learning algorithms. MLG conducts cutting-edge research in areas such as computer vision, natural language processing, streaming and edge computing, and scalable and distributed computing. We also emphasize machine learning algorithms’ interpretability, trustworthiness, responsiveness, and transparency. We constantly work as part of an interdisciplinary environment, actively collaborating with scientists in various domains, including biology, chemistry, material science, high energy physics, and nuclear physics. We also work in close collaboration with DOE’s leadership experimental facilities, including the National Synchrotron Light Source IIRelativistic Heavy Ion Collider, and Center for Functional Nanomaterials.

Group Projects

This project aims to advance protein engineering by leveraging the impressive performance of foundation AI models, such as ChatGPT, in language tasks. Our key research question is to discern if these powerful models can be adapted to predict protein functions and design artificial proteins with desired biological functions.

This project is harnessing novel machine learning algorithms to analyze the substantial data influx generated by X-ray Free Electron Laser (XFEL) experiments. The overarching goal is to distinguish effectively between the genuine fluctuations exhibited by a material sample and the inherent beam fluctuations intrinsic to XFELs.

This work employs a software and hardware co-design approach for particle-physic-related applications for energy-efficient, low-latency neuromorphic networks that can be fabricated by a conventional custom integrated circuit fabrication process that facilitates AI-enabled ASICs (application-specific integrated circuits).

Developing machine learning and natural language processing (NLP) methods for automatic table extraction from non-machine-readable documents. Tables are ubiquitous and high-density information resources, yet their contents often cannot be accessed or processed in an automated manner.

Power system operators struggle to plan for events like extreme weather, despite forecasts. Uncertainties in timing, location, and grid vulnerabilities complicate planning. Stochastic optimization algorithms enhance system redispatch in the face of unknown threats. Leveraging modern statistical, artificial intelligence, and machine learning techniques can aid in scenario generation, but uncertain behavior remains a challenge.

This project is developing the infrastructure for Human-AI-interfaced autonomous experiments that integrate synthesis and characterization at a beamline. It's building on recent strides using Bluesky to orchestrate human-in-the-loop [semi-]autonomous multimodal experiments at National Synchrotron Light Source II beamlines, contemporary flow reactor design for high-throughput synthesis at the Center for Functional Nanomaterials, and AI/algorithmic advancements led at CSI.

The scientific goal of this project is to produce predictive models, reference datasets, and offer analytical tools and demonstrate their utility in DOE biological research relating to bioenergy and biomass production, carbon cycle, and the study of subsurface microbial communities. The operational goal is to create the infrastructure needed to support the creation, maintenance, and use of predictive models and methods in the study of microbes, microbial communities, and plants.

This project is focused on an end-to-end, 5G-enabled, reliable, and decentralized Internet of Things (IoT) framework with specific goals.

This project investigates the use of an artificial intelligence-directed information distillation algorithm for computational real-time data reduction, including reliable noise filtering, feature extraction, lossy compression, and more.