Because you are not running JavaScript or allowing active scripting, some features on this page my not work. >> Enable Javascript <<

Lingda Li

Research Staff 4 Computational, Comp. for Nat'l Sec, Computational Science

Brookhaven National Laboratory

Computational Science
Bldg. 725
P.O. Box 5000
Upton, NY 11973-5000

(631) 344-4693
lli@bnl.gov

Lingda Li is a computer scientist at Brookhaven National Laboratory. He is generally interested in computer architecture and programming model research, with recent focuses on performance simulation/modeling, memory system, and machine learning. Before joining BNL, he worked at the Department of Computer Science of Rutgers University as a postdoc to carry out GPGPU research between 2014 and 2016, He obtained PhD in computer architecture from the Microprocessor Research and Development Center, Peking University in 2014.

Research | Publications

Research Activities

Machine learning-based computer architecture modeling and simulation

We propose PerfVec, a novel deep learning-based performance modeling framework that learns high-dimensional and independent/orthogonal program and microarchitecture representations. Once learned, a program representation can be used to predict its performance on any microarchitecture, and likewise, a microarchitecture representation can be applied in the performance prediction of any program. Additionally, PerfVec yields a foundation model that captures the performance essence of instructions, which can be directly used by developers in numerous performance modeling related tasks without incurring its training cost. The evaluation demonstrates that PerfVec is more general and efficient than previous approaches. This work is published in SC 2024. The source code is available at https://github.com/PerfVec/PerfVec.

We propose the first work to accelerate microarchitecture simulation using machine learning (ML). First, an ML-based instruction latency prediction framework that accounts for both static instruction properties and dynamic processor states is constructed. Then, a GPU-accelerated parallel simulator is implemented based on the proposed instruction latency predictor, and its simulation accuracy and throughput are validated and evaluated against a state-of-the-art simulator. Leveraging modern GPUs, the ML-based simulator outperforms traditional CPU-based simulators significantly. This work is published in SIGMETRICS 2022 and SC 2022. The source code is available at https://github.com/lingda-li/simnet.

AI-enabled, hierachical modeling of future supercomputers

Machine learning-based solver for clould simulation

Selected Publications

Li L, Flynn T, Hoisie A (2024) Learning Generalizable Program and Architecture Representations for Performance Modeling. SC24: International Conference for High Performance Computing, Networking, Storage and Analysis 1–15. https://doi.org/10.1109/sc41406.2024.00072
Zhang T, Li L, López-Marrero V, et al (2024) Emulator of PR-DNS: Accelerating Dynamical Fields With Neural Operators in Particle-Resolved Direct Numerical Simulation. Journal of Advances in Modeling Earth Systems 16:. https://doi.org/10.1029/2023ms003898
Pandey S, Li L, Flynn T, et al (2022) Scalable Deep Learning-Based Microarchitecture Simulation on GPUs. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. https://doi.org/10.1109/sc41404.2022.00084
Li L, Pandey S, Flynn T, et al (2022) SimNet. Proceedings of the ACM on Measurement and Analysis of Computing Systems 6:1–24. https://doi.org/10.1145/3530891
Zhang H, Li L, Liu H, et al (2022) Bring orders into uncertainty. Proceedings of the 36th ACM International Conference on Supercomputing. https://doi.org/10.1145/3524059.3532379
Pandey S, Wang Z, Zhong S, Tian C, Zheng B, Li X, Li L, Hoisie A, Ding C, Li D, Liu H (2021) Trust: Triangle Counting Reloaded on GPUs. IEEE Transactions on Parallel and Distributed Systems 32:2646–2660. doi: 10.1109/tpds.2021.3064892
Zhang H, Li L, Zhuang D, Liu R, Song S, Tao D, Wu Y, Song SL (2021) An efficient uncertain graph processing framework for heterogeneous architectures. Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. doi: 10.1145/3437801.3441584
Pandey S, Li L, Hoisie A, Li XS, Liu H (2020) C-SAW: A Framework for Graph Sampling and Random Walk on GPUs. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. doi: 10.1109/sc41405.2020.00060
Li L, Chapman B (2019) Compiler assisted hybrid implicit and explicit GPU memory management under unified address space. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. doi: 10.1145/3295500.3356141
Li L, Finkel H, Kong M, Chapman B (2018) Manage OpenMP GPU Data Environment Under Unified Address Space. Lecture Notes in Computer Science 69–81. doi: 10.1007/978-3-319-98521-3_5
Li L, Geda R, Hayes AB, Chen Y, Chaudhari P, Zhang EZ, Szegedy M (2017) A Simple Yet Effective Balanced Edge Partition Model for Parallel Computing. Proceedings of the ACM on Measurement and Analysis of Computing Systems 1:1–21. doi: 10.1145/3084451
Mishra A, Li L, Kong M, Finkel H, Chapman B (2017) Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading. Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC. doi: 10.1145/3148173.3148184
Li L, Hayes AB, Song SL, Zhang EZ (2016) Tag-Split Cache for Efficient GPGPU Cache Utilization. Proceedings of the 2016 International Conference on Supercomputing. doi: 10.1145/2925426.2926253
Hayes AB, Li L, Chavarría-Miranda D, Song SL, Zhang EZ (2016) Orion. Proceedings of the 17th International Middleware Conference. doi: 10.1145/2988336.2988355
Li L, Lu J, Cheng X (2014) Block value based insertion policy for high performance last-level caches. Proceedings of the 28th ACM international conference on Supercomputing - ICS '14. doi: 10.1145/2597652.2597653
Li L, Tong D, Xie Z, Lu J, Cheng X (2012) Improving inclusive cache performance with two-level eviction priority. 2012 IEEE 30th International Conference on Computer Design (ICCD). doi: 10.1109/iccd.2012.6378668
Li L, Tong D, Xie Z, Lu J, Cheng X (2012) Optimal bypass monitor for high performance last-level caches. Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12. doi: 10.1145/2370816.2370862

Brookhaven National Laboratory

Computational Science
Bldg. 725
P.O. Box 5000
Upton, NY 11973-5000

(631) 344-4693
lli@bnl.gov

Lingda Li

Research Staff 4 Computational, Comp. for Nat'l Sec, Computational Science

Brookhaven National Laboratory

Research Activities

Machine learning-based computer architecture modeling and simulation

AI-enabled, hierachical modeling of future supercomputers

Machine learning-based solver for clould simulation

Selected Publications

Brookhaven National Laboratory

Lingda's Links

Brookhaven National Laboratory

Brookhaven Science Associates