Ozgur Kilic
Computational Research Scientist, Computational Science Department, Computational Science
Brookhaven National Laboratory
Computational Science
Bldg. 725, Room 2-113
P.O. Box 5000
Upton, NY 11973-5000
(631) 344-3203
(631) 344-3203
okilic@bnl.gov
Pronouns: He/Him
Dr. Ozgur Ozan Kilic is a Computational Research Scientistat Brookhaven National Laboratory (BNL), where he focuses on advancing the performance, scalability, and adaptability of scientific workflows and high-performance computing (HPC), with an emphasis on AI/ML-driven and AI-coupled workflows.
Expertise | Research | Education | Publications | Highlights
Expertise
- Scientific Workflow Systems and Performance Engineering
- Distributed, Adaptive, and Intelligent Scheduling for HPC
- AI/ML-Integrated Scientific Workflows
- Performance Modeling, Portability, and Reproducibility
- Resilience and Fault Tolerance in Large-Scale Computing
Research Activities
- Develop systems and abstractions for composable and heterogeneous scientific workflows across HPC and distributed environments
- Design adaptive and asynchronous execution strategies for workflows integrating simulation, data, and AI/ML components
- Advance performance observability and analysis for complex workflows, enabling deeper insight into execution behavior across heterogeneous systems
- Create performance models and surrogate-based methods to understand, predict, and optimize workflow execution at scale
- Enable reproducibility and portability of large-scale workflows through interoperable and simplified representations (e.g., workflow abstractions and mini-apps)
- Investigate memory- and system-level performance characteristics to inform efficient execution of data-intensive and workflow-driven applications
Education
He earned his Ph.D. in Computer science at the State University of New York at Binghamton.
Selected Publications
- Nicolae B, Islam TZ, Ross R, et al (2023) Building the I (Interoperability) of FAIR for Performance Reproducibility of Large-Scale Composable Workflows in RECUP. 2023 IEEE 19th International Conference on e-Science (e-Science). https://doi.org/10.1109/e-science58273.2023.10254808
- Pascuzzi VR, Kilic OO, Turilli M, Jha S (2023) Asynchronous Execution of Heterogeneous Tasks in ML-Driven HPC Workflows. Lecture Notes in Computer Science 27–45. https://doi.org/10.1007/978-3-031-43943-8_2
- Kilic OO, Tallent NR, Suriyakumar Y, et al (2022) MemGaze: Rapid and Effective Load-Level Memory Trace Analysis. 2022 IEEE International Conference on Cluster Computing (CLUSTER). https://doi.org/10.1109/cluster51413.2022.00058
- Márquez A, Tallent N, Kilic O, et al (2022) Fixing Amdahl's Law within the Limits of Accelerated Systems: FALLACY. Office of Scientific and Technical Information (OSTI)
- Kilic OO, Tallent NR, Friese RD (2020) Rapid Memory Footprint Access Diagnostics. 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). https://doi.org/10.1109/ispass48437.2020.00047
- Kilic OO, Tallent NR, Friese RD (2019) Rapidly Measuring Loop Footprints. 2019 IEEE International Conference on Cluster Computing (CLUSTER). https://doi.org/10.1109/cluster.2019.8891025
- Kilic O, Doddamani S, Bhat A, et al (2018) Overcoming Virtualization Overheads for Large-vCPU Virtual Machines. 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). https://doi.org/10.1109/mascots.2018.00042
- Park DK, Ren Y, Kilic OO, et al (2024) AI Surrogate Model for Distributed Computing Workloads. SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis 79–86. https://doi.org/10.1109/scw63240.2024.00018
- Yokelson D, Titov M, Ramesh S, et al (2024) Enabling Performance Observability for Heterogeneous HPC Workflows with SOMA. Proceedings of the 53rd International Conference on Parallel Processing 220–230. https://doi.org/10.1145/3673038.3673100
- Kilic, Ozgur O., et al. "Workflow mini-apps: Portable, scalable, tunable & faithful representations of scientific workflows." 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE, 2024.
- Feng S, Kim J, Yang Y, et al (2026) Alternative mixed integer linear programming optimization for joint job scheduling and data allocation in grid computing. Future Generation Computer Systems 175:108075. https://doi.org/10.1016/j.future.2025.108075
- Kilic OO, Park DK, Ren Y, et al (2025) Towards an Introspective Dynamic Model of Globally Distributed Computing Infrastructures. EPJ Web of Conferences 337:01082. https://doi.org/10.1051/epjconf/202533701082
- Sarker AK, Alsaadi A, Perera N, et al (2024) Radical-Cylon: A Heterogeneous Data Pipeline for Scientific Computing. Job Scheduling Strategies for Parallel Processing 84–102. https://doi.org/10.1007/978-3-031-74430-3_5
- Atif M, Chopra K, Tsai F-Y, et al (2026) CelloAI Benchmarks: Toward Repeatable Evaluation of AI Assistants
- Yildirim E, Hussein M, Titov M, Kilic OO (2026) Predicting runtime and resource utilization of jobs on integrated cloud and HPC systems. Future Generation Computer Systems 176:108230. https://doi.org/10.1016/j.future.2025.108230
- Merzky A, Titov M, Turilli M, et al (2025) Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow Applications. 2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 962–969. https://doi.org/10.1109/ipdpsw66978.2025.00150
- Sri Vatsavai S, Ahmed RK, Hsu K-C, et al (2025) CGSim: A Simulation Framework for Large Scale Distributed Computing Environment. Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis 1478–1483. https://doi.org/10.1145/3731599.3769277
- Chopra K, Cucinell C, Weinberg R, et al (2026) A Spatio-Temporal Analysis Framework for Characterizing Radiation-Induced Genomic Instability. https://doi.org/10.64898/2026.02.21.707188
- Chowdhury T, Maeno T, Akman FF, et al (2025) Machine learning-driven predictive resource management in complex science workflows. International Journal of Modern Physics A 41:. https://doi.org/10.1142/s0217751x26500259
- Sarker AK, Alsaadi A, Perera N, et al (2024) Design and Implementation of an Analysis Pipeline for Heterogeneous Data
- Atif M, Chopra K, Kilic O, et al (2025) CelloAI: Leveraging Large Language Models for HPC Software Development in High Energy Physics
- Kurafeeva L, Subedi A, Hartung R, et al (2025) xGFabric: Coupling Sensor Networks and HPC Facilities with Private 5G Wireless Networks for Real-Time Digital Agriculture. Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis 2317–2327. https://doi.org/10.1145/3731599.3767589
- Hsu K-C, Vatsavai SS, Kilic OO, et al (2025) Data Management System Analysis for Distributed Computing Workloads. Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis 279–289. https://doi.org/10.1145/3731599.3767370
- Dutta S, Kilic O, Korchuganova T, et al (2025) Error Analysis of Globally Distributed Workflow Management System. Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis 968–976. https://doi.org/10.1145/3731599.3767461
- Suriyakumar Y, Tallent NR, Marquez A, et al (2024) MemFriend: Understanding Memory Performance with Spatial-Temporal Affinity. Proceedings of the International Symposium on Memory Systems 270–284. https://doi.org/10.1145/3695794.3695820
Research Highlights
- Developed performance observability and analysis methods for scientific workflows, enabling deeper understanding of execution behavior across heterogeneous HPC systems.
- Introduced adaptive and asynchronous execution strategies for integrating simulation, data, and AI/ML components within large-scale workflows.
- Designed performance modeling and surrogate-based approaches to predict and optimize workflow execution, reducing cost and improving scalability.
- Advanced reproducibility and portability of scientific workflows through simplified abstractions and workflow representations.
- Investigated system- and memory-level performance bottlenecks in data-intensive applications, informing more efficient execution strategies on modern HPC architectures.
Brookhaven National Laboratory
Computational Science
Bldg. 725, Room 2-113
P.O. Box 5000
Upton, NY 11973-5000
(631) 344-3203
(631) 344-3203
okilic@bnl.gov