## ModSim-2025 # ModSim Challenges in Secure and Resilient AI (SARA) System Design Pradip Bose Distinguished Research Scientist and **Manager of Efficient and Resilient Systems** *IBM Research* pbose@us.ibm.com This research was developed in part with funding from the Defense Advanced Research Projects Agency (DARPA) and later from the DoD/RAMP-C program. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. ### **DARPA-hard Challenges:** a good way of pushing the envelope in systems R&D 2011 ### System Architectural Vision for the Cognitive Era - Mobile (swarm) computing - With on-demand support from cloud - Unstable wireless bandwidth - Interaction over ad hoc networks - Resilient system reconfiguration (on node failure or idle rotation) - Adaptive abstraction within devices - Approximation, sampling, filtering - Machine learning acceleration - Dynamic voltage and frequency control - Needs at / near the edge: - On-device inference - On-device training - Low power / voltage (possibly harvested energy) - Harsh environment resilience - Security against attacks The domain of mobile cognition Are there common principles behind architecting resilient, efficient cloud & edge processors? Meanwhile, the modern age of AI had begun in 2011 - IBM Watson (Deep Q&A, Jeopardy champion) - Siri (iPhone/Apple) edge NLP Agile SoC **Programmability** Data security **Privacy** [Tom Rondeau] [Bob Colwell, Joe Cross, ...] 2013 - 2018 **Power Efficiency Revolution for Embedded Computing Technologies** $1 \text{ GF/W} \rightarrow 75 \text{ GF/W}$ IBM + Stanford, Harvard, U of Virginia [Tom Rondeau] $2018 \rightarrow 2023/ongoing$ **Domain-Specific System on Chip** Power-perf, programmability, productivity metrics IBM + Columbia, Harvard, UIUC $2021 \rightarrow \text{ongoing}$ (IBM was not part of DPRIVE; but we pursued the same goal, 2022-2025 w/support from DoD/RAMP-C), IBM + Columbia ### **Executive Summary of ModSim Challenges Faced** (across the three govt-sponsored R&D projects) # 1. Design Verification (and Test!) - Architects woefully lack tools and metrics to gauge verification complexity in pre-silicon modeling - Agile SoC design claims avoid factoring in verification time # 2. Robust Power Management - On-chip, workload-driven power management architectures have become increasingly more advanced and sophisticated - But...ModSim-driven reliability & security guarantees are lacking # 3. Security Metrics and Pre-Silicon Modeling Largely absent! (Urgent need) Deficiencies above cause shortfalls in system resilience and inhibit product quality deployment of devised solutions # RESILIENCE In machine terms, it roughly means *reliable operation under error-prone or harsh environments* # In human (and perhaps AI?) terms, on the other hand.... Resilience, a key component of <u>emotional intelligence</u>, is essentially the ability to "bounce back" from stressful experiences. https://www.psychologytoday.com/us/blog/comfort-cravings/201308/getting-back-emotional-intelligence-and-resilience ### What is *Efficient* Resilience? System design approach to improve efficiency with "guarantees" of operational correctness or quality for a given application workload (even under hostile circumstances) ### The ModSim-Driven PERFECTion - Experimental set-up used: a full-stack software-hardware system consisting of an FPGA implementation of an open-source processor (LEON3-OpenSparc) with matrix multiplication application - Resilience improvement for current system, with our cross-layer technology was evaluated using fault injection at the latch level - Cross-layer knobs used: Selective latch hardening (circuit), parity (logic/microarch), control/dataflow checking (microarch), algorithm based fault tolerance, ABFT (software) #### Calculation Assumptions | Node | Supply | FIT (shrink) | FIT (voltage) | FIT (total) | |-------|--------|--------------|---------------|-------------| | 32 nm | 1.00 v | 1x | 1x | 1x | | 22 nm | 0.85 v | 2x | 2x | 4x | | 14 nm | 0.65 v | 4x | 8x | 12x | | 10 nm | 0.50 v | 8x | 32x | 40x | | 7 nm | 0.50 v | 16x | 32x | 48x | - FIT = unit of failure rate; 1 FIT = 1 failure in a billion hours; system mean time to failure, MTTF ~ 1/FITs - System FITs will increase with technology node (bad!) - Two effects considered here: (a) device size shrinkage per Moore's Law: 2x component count increase per generation; and (b) increase of transient error rates (SER, voltage noise) with voltage reductions required to meet end target of 75 GF/W - Note: FITs are additive; so last column = sum of the prior two ### **PERFECT: Overall System Modeling Framework** (Delivered in Phase-1; analytical models, open-source software toolset) SHIVA-1 Framework SHIVA-2 delivered in Phase-2 includes cycle-accurate processor core and accelerator elements Cross-layer Efficient Reslience Technologies Latch-accurate SHIVA-3 model in Phase-3 will be fully *design-ready*, with key FPGA component prototype implementations # Test Chips to Validate Modeled PERFECT Innovations in Efficient Resilience; three accepted papers at VLSI Tech. & Circuits Symposia (Kyoto) Ultra low-Vmin SRAM is a major technology breakthrough – in the quest for 75 GF/W embedded systems 14nm FinFET Based Supply Voltage Boosting Techniques for Extreme Low Vmin Operation R. V. Joshi, M. Ziegler, H. Wetter, C. Wandel, H. Ainspan, **IBM** IVR model calibration/& proof of voltage-stacking efficacy is a key new advance in exploring optimal Vdd settings for targeted embedded systems A 16-core voltage-stacking system with an integrated switched-capacitor DC-DC converter S. K. Lee, T. Tong, X. Zhang, D. Brooks, G-Y. Wei, **Harvard University** Robo-bees brain SoC chip tests provide validation insights about ultra low power cognitive acceleration A Multi-Chip System Optimized for Insect-Scale Flapping-Wing Robots X. Zhang, M. Lok, T. Tong, S. Chaput, S. K. Lee, B. Reagan, H. Lee, D. Brooks, G-Y. Wei, **Harvard University** ### A Couple of Key ModSim-Relevant Papers from our PERFECT Project **CLEAR Cross-Layer Resilience: A Retrospective.** <u>IEEE Des. Test 42(3)</u>: 74-85 (2025); Eric Cheng et al. (Stanford-led work) A key ModSim takeaway: architectural abstractions in faultinjection simulation are hazardous, the conclusions can be grossly misleading! **Up to 45x inaccuracy** **BRAVO: Balanced Reliability-Aware Voltage Optimization.** <u>HPCA 2017</u>: 97-108 Karthik Swaminathan et al. (IBM work) ModSim-driven discourse on how to optimize the voltage-frequency operating point to achieve highest performance without violating power and reliability constraints https://www.youtube.com/watch?v=YvbHXz3lccc That was 10 years ago! 2025 ModSim | August 2025 ### **DARPA-hard Challenges:** a good way of pushing the envelope in systems R&D 11 #### Our recently-completed DARPA (DSSoC) sponsored project: #### IBM #### **EPOCHS: Efficient Programmability of Cognitive Heterogeneous Systems** connected autonomous vehicles (CAVs)\*\* - Agile design of heterogeneous DSSoCs with programmability as a primary consideration - Open-source software and hardware - Technology transition: within IBM and outside, including DoD entities one example <a href="https://mas400.com">https://mas400.com</a> Tightly knit collaborative team: IBM + UIUC, Harvard and Columbia Targeted impact on AI hardware roadmap: energy reduction, without giving up inferential accuracy ### Agile SoC Design Flow: the Heart of EPOCHS ModSim 2025 ModSim | August 2025 1: ### **ESP** SoC Flow ## **EPOCHS/DSSoC: Accomplishments Summary** #### **EPOCHS-0 SoC tapeout** 4×4 SoC fabricated #### **Scaled-out EPOCHS-1 SoC tapeout** 6×6 SoC with new accelerators #### Significant design cost mitigation − 10×−100× reduction in person-years # Hardware-agnostic programming of heterogeneous SoCs HPVM compiler, smart scheduler... #### **Open-source ecosystem for collaboration** ERA: github.com/IBM/era HPVM: gitlab.engr.illinois.edu/llvm/hpvm-release Mini-ERA: github.com/IBM/mini-era STOMP: github.com/IBM/stomp **ESP:** www.esp.cs.columbia.edu **Scheduler:** github.com/IBM/scheduler-library Spandex: github.com/sld-columbia/esp/tree/master/rtl/caches Chip back from fab + packaging (July 2022) Respin: Nov 2023 #### ESSCIRC-2022 paper #### Simultaneous apps 4 (goal: $\geq$ 2) #### Integration time for new accelerators 2 weeks average (goal: ≤ 3 months) #### **Power** **NoC:** 7.2% of chip (goal: $\leq$ 40% of chip) **Chip:** 240mW – 1.83W (op. range: 0.5V – 1.0V) Peak frequency at 1.0V: 1.52 GHz #### Benefits of acceleration | | FFT | Viterbi | |-------------|------|---------| | Performance | 71× | 20× | | Energy | 233× | 56× | #### Even more amazing results! A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types, Distributed Hardware Power Management and NoC-Based Data Orchestration Maioc Cassel dos Santos", Tianyu Jiar", Joseph Zuckerman", Martin Cochet", Davide Girl', Erik Loszabo', Karthik Swaminathan", Thiery Tambe', Jeff Jun Zhang', Alper Buyuktosunoglu", Kuan-Lin Chiu', Giuseppe Di Guglielmo', Paolo Mantovani', Luca Piccolboni', Gabriele Tombesi', David Trilla", John-David Wellman', En-Yu Yang', Apova Amamath', Ying Jing', Bakshree Mishra', Joshua Park', Vignesh Suresh', Santa Adve', Pradip Bose', David Brooks', Luca P. Carloni', Kenneth L. Shepard', Gu-Yeon Wel<sup>2</sup> "These autors have equi controllations. 1 COLUMBIA UNIVERSITY RVARD ⁴**I** i ISSCC-2024 ISCA-2024 VLSI Symp.2024 2025 ModSim | August 2025 SCAN ME ### **EPOCHS-1 SoC Highlights** #### A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types, Distributed Hardware Power Management and NoC-Based Data Orchestration Maico Cassel dos Santos<sup>1\*</sup>, Tianyu Jia<sup>2\*</sup>, Joseph Zuckerman<sup>1\*</sup>, Martin Cochet<sup>3\*</sup>, Davide Giri<sup>1</sup>, Erik Loscalzo<sup>1</sup>, Karthik Swaminathan<sup>3</sup>, Thierry Tambe<sup>2</sup>, Jeff Jun Zhang<sup>2</sup>, Alper Buyuktosunoglu<sup>3</sup>, Kuan-Lin Chiu<sup>1</sup>, Giuseppe Di Guglielmo<sup>1</sup>, Paolo Mantovani<sup>1</sup>, Luca Piccolboni<sup>1</sup>, Gabriele Tombesi<sup>1</sup>, David Trilla<sup>3</sup>, John-David Wellman<sup>3</sup>, En-Yu Yang<sup>2</sup>, Aporva Amarnath<sup>3</sup>, Ying Jing<sup>4</sup>, Bakshree Mishra<sup>4</sup>, Joshua Park<sup>2</sup>, Vignesh Suresh<sup>4</sup>, Sarita Adve<sup>4</sup>, Pradip Bose<sup>3</sup>, David Brooks<sup>2</sup>, Luca P. Carloni<sup>1</sup>, Kenneth L. Shepard<sup>1</sup>, Gu-Yeon Wei<sup>2</sup> \* These authors have equal contributions. #### ISSCC-2024 Paper #### BlitzCoin: Fully Decentralized Hardware Power Management for Accelerator-Rich SoCs Martin Cochet<sup>1</sup>, Karthik Swaminathan<sup>1</sup>, Erik Loscalzo<sup>2</sup>, Joseph Zuckerman<sup>2</sup>, Maico Cassel dos Santos<sup>2</sup>, Davide Giri<sup>2</sup>, Alper Buyuktosunoglu<sup>1</sup>, Tianyu Jia<sup>3</sup>, David Brooks<sup>3</sup>, Gu-Yeon Wei<sup>3</sup>, Kenneth Shepard<sup>2</sup>, Luca P. Carloni<sup>2</sup>, and Pradip Bose<sup>1</sup> IBM Research, Yorktown Heights, NY <sup>2</sup>Columbia University, New York, NY <sup>3</sup>Harvard University, Cambridge, MA #### ISCA-2024 paper ### A 400-ns-Settling-Time Hybrid Dynamic Voltage Frequency Scaling Architecture and Its Application in a 22-Core Network-on-Chip SoC in 12-nm FinFET Technology Erik Loscalzo<sup>1</sup>, Martin Cochet<sup>2</sup>, Joseph Zuckerman<sup>1</sup>, Samira Zaliasl<sup>3</sup>, Michael Lekas<sup>3</sup>, Stephen Cahill<sup>3</sup>, Tianyu Jia<sup>4</sup>, Karthik Swaminathan<sup>2</sup>, Maico Cassel dos Santos<sup>1</sup>, Davide Giri<sup>1</sup>, Hesam Sadeghi<sup>3</sup>, Joseph Meyer<sup>3</sup>, Noah Sturcken<sup>3</sup>, David Brooks<sup>4</sup>, Gu-Yeon Wei<sup>4</sup>, Luca Carloni<sup>1</sup>, Pradip Bose<sup>2</sup>, Kenneth Shepard<sup>1</sup> Columbia University, New York, NY, <sup>2</sup>IBM Research, Yorktown Heights, NY, <sup>3</sup>Ferric Inc., New York, NY, <sup>4</sup>Harvard University, Cambridge, MA, E-mail: erik.loscalzo@columbia.edu VLSI Symp. 2024 paper - 64 mm<sup>2</sup> SoC designed in 12 nm FinFET - 35 clock domains; 23 power domains - 8.4 MB on-chip SRAM memory - Tile-based SoC architecture - 34 tiles connected by a 6-plane 2-D mesh NoC - The 74 Tbps NoC provides flexible orchestration of data - 23 accelerators of 14 different types - 10 accelerators compose a cluster demonstrating a novel distributed hardware power management scheme - Designed by a small team of PhD students, postdocs, and industry researchers in 3 months with ESP, an open-source platform for agile SoC design 2025 ModSim | August 2025 # Distributed Hardware Power Management - Concurrent execution of 5 accelerators under fixed 80mW power cap - Without DHPM (baseline), each tile is allocated a fixed power - With DHPM, power is dynamically reallocated among tiles (early-stage concept ModSim) <sup>\*</sup> Animation frames taken every 100 simulation iterations (animations won't show up in pdf, sorry!) (early-stage concept ModSim) <sup>\*</sup> Animation frames taken every 100 simulation iterations (animations won't show up in pdf, sorry!) (early-stage concept ModSim) <sup>\*</sup> Animation frames taken every 100 simulation iterations (animations won't show up in pdf, sorry!) (early-stage concept ModSim) <sup>\*</sup> Animation frames taken every 100 simulation iterations (animations won't show up in pdf, sorry!) (early-stage concept ModSim) <sup>\*</sup> Animation frames taken every 100 simulation iterations (animations won't show up in pdf, sorry!) ### Early Interest in Token-Based Power Management (12) United States Patent Bose et al. (10) Patent No.: US 7,930,578 B2 (45) **Date of Patent:** Apr. 19, 2011 - METHOD AND SYSTEM OF PEAK POWER **ENFORCEMENT VIA AUTONOMOUS** TOKEN-BASED CONTROL AND MANAGEMENT - (75) Inventors: **Pradip Bose**, Yorktown Heights, NY (US); Alper Buyuktosunoglu, White Plains, NY (US); Chen-Yong Cher, Port Chester, NY (US); Zhigang Hu, Ridgefield, CT (US); Hans Jacobson, White Plains, NY (US); Prabhakar N. Kudva, New York, NY (US); Vijayalakshmi Srinivasan, New York, Heights, NY (US) Assignee: International Business Machines Corporation, Armonk, NY (US) NY (US); Victor Zyuban, Yorktown References Cited #### U.S. PATENT DOCUMENTS | 2007/0028130 A1 * 2007/0050646 A1 * 2008/0250415 A1 * | 6/2006<br>10/2006<br>2/2007<br>3/2007<br>10/2008 | Mittal et al. Morgan et al. Narad et al. Schumacher et al. Conroy et al. Illikkal et al. Meier et al. | 713/300<br>710/240<br>713/320<br>713/300<br>718/103 | |-------------------------------------------------------|--------------------------------------------------|-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------| |-------------------------------------------------------|--------------------------------------------------|-------------------------------------------------------------------------------------------------------------|-----------------------------------------------------| <sup>\*</sup> cited by examiner (56) Primary Examiner — Thomas Lee Assistant Examiner — Brandon Kinsey (74) Attorney, Agent, or Firm — F. Chau & Associates, LLC; William J. Stock, Esq. ADCTDACT ### DSSoC was not just an edge vision or strategy – it applied to server/cloud as well! In the late CMOS era, domain specific accelerators will dominate - Primary server refresh at data center may be progressively delayed - Differentiation (feature, performance) via domain-specific accelerators - Al as a domain changes (scales up) at an astounding rate → see next slide! Agile hardware-software accelerator system synthesis is key to retaining customer base - Learn customer workloads - Design plug-in accelerator offerings; refresh choices every 6 mos. - Highly automated design flow → small team #### **Case in Point: Large Language Model (LLM) Parameter Growth Over Time\*** <sup>\*</sup> Plot generated by Microsoft Co-Pilot ## **DARPA-hard Challenges:** a good way of pushing the envelope in systems R&D - Mobile (swarm) computing - With on-demand support from cloud - Unstable wireless bandwidth - Interaction over ad hoc networks - Resilient system reconfiguration (on node failure or idle rotation) - Adaptive abstraction within devices - Approximation, sampling, filtering - Machine learning acceleration - Dynamic voltage and frequency control - Needs at / near the edge: - On-device inference - On-device training - Low power / voltage (possibly harvested energy) - Harsh environment resilience - Security against attacks The domain of mobile cognition Are there common principles behind architecting resilient, efficient cloud & edge processors? [Tom Rondeau] $2018 \rightarrow 2023/ongoing$ **Domain-Specific System on Chip** Power-perf, programmability, productivity metrics IBM + Columbia, Harvard, UIUC Agile SoC Data security **Programmability Privacy** [Tom Rondeau] $2021 \rightarrow \text{ongoing}$ (IBM was not part of DPRIVE; but we pursued the same goal, 2022-2025 W/support from DoD/RAMP-C), IBM + Columbia<sub>2</sub> [Bob Colwell, Joe Cross, ...] 2013 - 2018 **Power Efficiency Revolution for Embedded Computing Technologies** $1 \text{ GF/W} \rightarrow 75 \text{ GF/W}$ IBM + Stanford, Harvard, U of Virginia ### Beyond the DARPA DSSoC program ..... Next phase of R&D: worrying about DSSoC resilience at affordable power cost ... in the context of emerging trends in semiconductor and packaging technology ### Technology Path to 1 Trillion Transistors IBM AIU: Roadmap of Foundation Model AI accelerators Key technology and enablement needs: - State-of-the-art foundry CMOS - State-of-the-art siliconverified IP blocks for support functions (memory controllers, I/O interfaces) - Chiplets and 3D stacking AIU 1.0<sup>1</sup> IBM z System Telum-1, Telum-2 announcements; Spyre accelerator at Hot Chips 2022, 2023 Optimized for FM Inference 1 - Announced October 2022 Optimized for FM Inference and Fine-Tuning, + Training Leverage HBM OTTO 291mm<sup>2</sup> 5nm SoC #### AIU Next Optimized for future very large FM Inference + Fine-Tuning + Training Leverage 3D-stacked memory + chiplet technologies #### **CHIPS-Act Linked NAPMP Funding Opportunity** #### A Chiplets/Systems Design Inflection Point **Enabled by Advanced Packaging** Tomorrow: packages more like chips Today: packages more like boards Chip 1 Chip 2 Chip 3 10x wire **NAPMP Materials** and Substrates density & Proposer's Day growing NOFO1 10-layer subµ pitch 2nm, 2 upper metal layers fine line 5 trace layer PCB FOWLP 5 trace layer RDL Estimated wire density in wires/mm axis marks not to scale ≈ 2<u>0</u> $\approx 20.000^2$ ≈ 500 ≈ 6.000 Wire Wire Scarcity **Abundance Chiplets/Systems Tomorrow**<sup>3</sup> **Chiplets/Systems Today** With High-speed high-power interface Wire abundance Scale-down wire-like 2D/3D interface at 10 um and lower bond pitches Monolithic wafer-scale 10-100x larger packages **Scale-out** wafer-scale systems that exploit wire abundance Function & physical modularity Board-like integration **Ecosystem** for IP-like heterogeneous chiplet integration [1] P. Chiang, et al, "InFO\_oSTechnology for Advanced Chiplet Integration," 2021 IEEE 71st ECTC, San Diego, CA, USA, 2021, pp. 130-135. National Institute of Standards and Technology | U.S. Department of Commerce [2] Illustrative, approximate wire density numbers estimated from current state of the art. [3] NAPMP Vision Paper: The Vision for the CHIPS for America National Advanced PackagingManufacturing Program (nist.gov) ### Threats to AI hardware ### Side channel attacks Extract sensitive inforr Extract sensitive information (e.g., data, model parameters) using hardware side channels (timing, power, etc.) E.g., cache-based side channels like Spectre [Kocher et al., S&P 2019] and Meltdown [Lipp et al., USENIX 2018] can be used to extract data regularly accessed by a model #### Hardware trojans - Insert malicious hardware at design time to impact AI functionality - E.g., hardware trojan hidden in a unit of an SoC can launch a denial-of-service attack when triggered that prevents AI model from continuing computations [Charles et al., DATE 2019] #### Physical attacks - E.g., laser or voltage manipulation to alter system behavior and functionality - [Trouchkine et al., CoRR 2019] showed electromagnetic fault injection attacks can be used to target individual subsystems within an SoC ### The Need for Privacy-Aware Computing #### Cost of a data breach by industry (in USD millions) ### What is Homomorphic Encryption (HE)? - Cryptographic technique that enables processing and manipulation of encrypted data - Traditional crypto algorithms require data to be unencrypted for processing #### HOMOMORPHIC ENCRYPTION PIPELINE ### **Real-life Business Use Cases** - There are many other use cases of course. - The ones mentioned are representative of IBM's core mainframe business in the financial sector - The connected autonomous vehicle (CAV) edge sector remains a major area of interest as well #### AI/FHE Motivation: DARPA-hard Challenge (hardware acceleration) #### The larger context #### Data-Secure Computing: the End-to-End Picture The Encode-Encrypt, Decrypt-Decode (E2D2) client-side task is also important! - Poster child application to utilize the emerging trends in semiconductor and packaging technology (e.g. chiplets/3DHI) - Aligned with semiconductor business strategy; linked also to the government's CHIPS ACT related thrusts - 1000x 500000x acceleration needed to meet performance (e.g. real-time) needs, as addressed in the DARPA DPRIVE program We started up this project in January 2022 with the above challenge and business opportunity in mind - The initial (primary) focus is on AI-embedded transactional workloads like financial fraud detection (FFD). - But there are many other edge-cloud application workloads (with privacy-protection needs) that map into this space (e.g. the cloud-backed connected autonomous vehicular space just mentioned). #### AI/FHE Hardware Acceleration: Enabling Privacy-Protected Edge-to-Cloud AI Computation #### Our design strategy and long-term research vision: - Leverage our prior agile SoC design methodology (EPOCHS) to implement a AI/FHE SoC (chiplet) in order to demonstrate basic viability for a class of AI-centric inference workloads with an n-chiplet SiP solution (where n = 1, 2 or 4 in the first generation) - ✓ Individual chiplet size (area) is determined by yield (cost) constraints for a new technology node - ✓ Integrated UCIe interface allows scaled-up system solution with multiple chiplets and on-package memory modules (DDR or HBM) - 2. Scalable solution: start with an edge E2D2 capability, scale up to a cloud AI/FHE compute capability - Leverage 3DHI and chiplet technology to meet memory capacity and bandwidth requirements ### Circling back to emphasize the benefit of our agile SoC design methodology driven by Columbia's ESP: Impressive Productivity Gain **EPOCHS-0** Oct. 2020 2 RTL/Verif. Engineers 6 PD Engineers **EPOCHS-1** Nov. 2022 3 RTL/Verif. Engineers 6 PD Engineers Mini SoC Jan. 2024 1 RTL/Verif. Engineer 6 PD Engineers SARA-1 Sep. 2025 9 RTL/Verif. Engineers6 PD Engineers ### SARA-1 - Same advanced technology node - 8 copies of the SARA Processing Unit (SPU) - Programmable accelerator - Large (>10mm<sup>2</sup>) - Application has complex datadependency patterns - One-to-many communication - Chiplet-based w/ UCle - Heterogeneous tile sizes complicate physical design - Design completes in September 2025 # Motivation for pre-silicon security analysis #### Typical security cycle However, it can be expensive and less effective to implement mitigation solutions in post-silicon stages of design or after production. - Can we prioritize security during early design cycle stages? - Power, area, and performance are prioritized but can we include security in these early design considerations as well? - How can vulnerabilities be found in pre-silicon stages? - How can security be measured? - Can security evaluation be automated early in the design cycle to ease designer effort? Focus on side channel analysis from early design stages Pre-silicon: design cycle stages prior to taping out chip Post-silicon: taped out chip stages # Our approach to designing systems with power side channel leakage possibility in mind - Develop a **metric** to quantify how susceptible a design is to power side-channel information leakage - E.g., 0 to 1 values - 0 = no information leakage - 1 = fully vulnerable Iterate on design to reach desired security level Develop system design and determine desired security level Build approximate power profile Use simulator to find how susceptible design is to side-channel leakage cases Find possible cases where power side-channels may lead to observable information leakage # Benefits of our approach Our ideas can be applied based on the needs of the system • E.g., systems handling sensitive data may prioritize secure design over performance Our approach improves and eases secure design efforts Security **Tradeoff** Triangle • E.g., side-channel vulnerability evaluation can be easily automated Other security considerations may be brought to silicon design stages • E.g., other physical design phenomena such as Performance netic emanations may be similarly modeled and eval Power ### **Executive Summary of ModSim Challenges Faced** (across the three govt-sponsored R&D projects) # 1. Design Verification (and Test!) Don't forget the SDC scare! - Architects woefully lack tools and metrics to gauge verification complexity in pre-silicon modeling - Agile SoC design claims avoid factoring in verification time # 2. Robust Power Management - On-chip, workload-driven power management architectures have become increasingly more advanced and sophisticated - But...ModSim-driven reliability & security guarantees are lacking # 3. Security Metrics and Pre-Silicon Modeling Largely absent! (Urgent need) Deficiencies above cause shortfalls in system resilience and inhibit product quality deployment of devised solutions DFV/DFT and ModSim thereof PERFECT: Efficient Resilience In Embedded Computing # Thank You! Pradip Bose, Augusto Vega, IBM T. J. Watson Research Center Sarita Adve, Vikram Adve, Sasa Misailovic, University of Illinois at Urbana-Champaign Luca Carloni, Ken Shepard, Columbia University David Brooks, Gu-Yeon Wei, Vijay Janapa Reddi, Harvard University Kevin Skadron, Mircea Stan, University of Virginia