General Lab Information

Computation and Data-Driven Discovery (C3D) Projects

Deep Learning for Analysis of Materials Science Data

Modern scientific instruments are now generating data at unprecedented rates. In particular, Brookhaven’s new synchrotron (NSLS-II) offers unprecedented x-ray brightness and high-speed detectors. The correspondingly large data-rate is beyond the ability of human experimenters to manually interpret. It is now evident that a crucial complement to high-throughput instruments is automated analysis methods, which can categorize, tag, and analyze scientific data without human intervention. This automation liberates the human scientist to concentrate on high-level scientific questions, and focus their attention on the subset of the data most meaningful for a given problem. This extreme automation, in turn, enables more ambitious scientific projects. In particular, these methods enable streamlined materials discovery, where new materials with desired performance (mechanical, light-harvesting, energy storage, etc.) can be efficiently found.

machine-learning methods

This project will use machine-learning methods to analyze x-ray scattering data (bottom). Using structured deep learning methods (right), scientifically-meaningful insights will be automatically extracted from the data (left).

This project seeks to build automated, streaming analysis pipelines for extracting scientifically-meaningful insights from datasets relevant to materials discovery, especially x-ray scattering images. Our strategy is to develop data-analysis pipelines that serve a dual role: providing experimenters with useful (physically-meaningful) intermediate results, and using these analysis results as inputs to machine-learning methods. Thus, we are exploring both new machine-learning algorithms, and efficient analysis pipelines, both optimized for scientific data. We will develop innovative tools of data parameterization and statistical analysis that integrate techniques such as multi-scale hierarchical modeling, global pattern and local feature extraction. One powerful computational tool is the partial- differential-equation-enabled (PDE) spectral analysis paradigm for data modeling, analysis, and visualization. Using PDE-based heat diffusion theory, we will construct natural multi-scale structures on continuous manifolds or discrete graphs. The advantage of this approach is that it combines both the topological information of the model and the multi-scale property of data. These methods will be applied to a variety of ‘images’: the raw detector images generated by instruments, reconstructed maps of sample structures, or the abstract phase spaces of materials science. This automated analysis will be exploited to automate the scientific experiment itself, by providing feedback to algorithms that can efficiently explore scientific problems, and make decisions about what experiments to conduct next. Overall, this project aims to deliver a data analysis pipeline for x-ray synchrotron instruments, empowering more ambitious materials discovery experiments.

Publications

Kiapour, M.H.; Yager, K.G.; Berg, A.C.; and Berg, T.L. “Materials Discovery: Fine-Grained Classification of X-ray Scattering Images” Winter Conference on Applications of Computer Vision (WACV) 2014.

Huang, H.; Yoo, S.; Kaznatcheev, K.; Yager, K.G.; Lu, F.; Yu, D.; Gang, O. Fluerasu, A.; and Qin, H. “Diffusion-based Clustering Analysis of Coherent X-ray Scattering Patterns of Self-assembled Nanoparticles”, 29th Symposium On Applied Computing (SAC'14) March 24-28, 2014, Gyeongju, Korea.

Yao, S.; Chang, C.; Xu, W.; Zhou, N.; Chen-Wiegart, Y. K.; Wang, J.; Wang, J.; Yu, D. “NNLSF: A Fast and Informative Fitting Method for XANES Chemical Mapping Analysis”, International Symposium on Biomedical Imaging (ISBI'15), April, 2015.

Huang, H.; Yoo, S.; Yu, D.; and Qin, H. “Density-Aware Clustering based on Aggregated Heat Kernel and Its Transformation”, ACM Transactions on Knowledge Discovery from Data, 2015 (TKDD).

Wu, Q.; Lin, X.; Yu, D.; Xu, W.; and Li, L. “End-to-end Delay Minimization for Scientific Workflows in Clouds under Budget Constraint”, IEEE Transactions on Cloud Computing, Special Issue on Autonomic Provisioning of Big Data Applications on Clouds, Volume 3, Number 2, 2015.

Huang, H.; Yoo, S.; Qin, H.; and Yu, D. “Physics-based Anomaly Detection Defined on Manifold Space”, ACM Transactions on Knowledge Discovery from Data, 2014 (TKDD), Volume 9, Issue 2, 2014.

Huang, H.; Yoo, S.; Yu, D.; and Qin, H. “Noise-Resistant Unsupervised Feature Selection via Multi-Perspective Correlations”, regular paper, IEEE International Conference on Data Mining (ICDM), December 2014.