# 2023 28th Asia and South Pacific Design Automation Conference (ASP-DAC 2023) Tokyo, Japan 16-19 January 2023 IEEE Catalog Number: CFP23ASP-POD ISBN: 978-1-6654-6561-8 ## Copyright © 2023, Association for Computing Machinery (ACM) **All Rights Reserved** \*\*\* This is a print representation of what appears in the IEEE Digital Library. Some format issues inherent in the e-media version may also appear in this print version. IEEE Catalog Number: CFP23ASP-POD ISBN (Print-On-Demand): 978-1-6654-6561-8 ISBN (Online): 978-1-4503-9783-4 ISSN: 2153-6961 ### Additional Copies of This Publication Are Available From: Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA Phone: (845) 758-0400 Fax: (845) 758-2633 curran@proceedings.com E-mail: Web: www.proceedings.com # **ASP-DAC 2023** # Contents | Front Matter | |-------------------------------------------------------------------------------------------------------------------------| | Welcome to ASP-DAC 2023 viii | | Message from the Technical Program Committee ix | | Organizing Committee | | Technical Program Committee | | University LSI Design Contest Committee xiii | | Designers' Forum Committee | | Steering Committee | | University LSI Design Contest | | Designers' Forum | | ACM SIGDA Student Research Forum at ASP-DAC 2023 | | Best Paper Award | | University LSI Design Contest Award | | 10-Year Retrospective Most Influential Paper Award | | Keynote Addresses | | Invitation to ASP-DAC 2024 | | List of Reviewers (Regular topic) | | List of Reviewers (UDC) | | Technical Program Session 1A: Reliability Considerations for Emerging Computing and Memory Architectures | | A Fast Semi-Analytical Approach for Transient Electromigration Analysis of Interconnect Trees using Matrix Exponential | | Chiplet Placement for 2.5D IC with Sequence Pair Based Tree and Thermal Consideration | | An On-line Aging Detection and Tolerance Framework for Improving Reliability of STT-MRAMs 13 | | Session 1B: Accelerators and Equivalence Checking | | Automated Equivalence Checking Method for Majority based In-Memory Computing on ReRAM | | Crossbars | | An Equivalence Checking Framework for Agile Hardware Design | | Towards High Dandwidth Utilization SnMV on EDC As via Dartial Vector Dunlication 22 | | Session 1C: New Frontiers in Cyber-Physical and Autonomous Systems | | |----------------------------------------------------------------------------------------------------------------------|-----| | Safety-driven Interactive Planning for Neural Network-based Lane Changing | 39 | | Safety-Aware Flexible Schedule Synthesis for Cyber-Physical Systems using Weakly-Hard Constraints | 46 | | Mixed-Traffic Intersection Management Utilizing Connected and Autonomous Vehicles as Traffic Regulators | 52 | | Session 1D: Machine Learning Assisted Optimization Techniques for Analog Circuits | | | Fully Automated Machine Learning Model Development for Analog Placement Quality Prediction | 58 | | Efficient Hierarchical mm-Wave System Synthesis with Embedded Accurate Transformer and Balun Machine Learning Models | 64 | | APOSTLE: Asynchronously Parallel Optimization for Sizing Analog Transistors using DNN Learning | 70 | | Session 2A: Machine Learning for Reliable, Secure, and Cool Chips: A Journey from Transistors to Systems | | | ML to the Rescue: Reliability Estimation from Self-Heating and Aging in Transistors all the Way up Processors | 76 | | Graph Neural Networks: A Powerful and Versatile Tool for Advancing Design, Reliability, and Security of ICs | 83 | | Detection and Classification of Malicious Bitstreams for FPGAs in Cloud Computing | 91 | | Learning Based Spatial Power Characterization and Full-Chip Power Estimation for Commercial TPUs | | | Session 2B: High Performance Memory for Storage and Computing | | | DECC: Differential ECC for Read Performance Optimization on High-Density NAND Flash Memory | 104 | | Optimizing Data Layout for Racetrack Memory in Embedded Systems | 110 | | Exploring Architectural Implications to Boost Performance for in-NVM B+-tree | 116 | | An Efficient Near-Bank Processing Architecture for Personalized Recommendation System | 122 | | Session 2C: Cool and Efficient Approximation | | | PAALM: Power Density Aware Approximate Logarithmic Multiplier Design | 128 | | Approximate Floating-Point FFT Design with Wide Precision-Range and High Energy Efficiency | 134 | | RUCA: RUntime Configurable Approximate Circuits with Self-Correcting Capability | 140 | | Approximate Logic Synthesis by Genetic Algorithm with an Error Rate Guarantee | 146 | | Session 2D: Logic Synthesis for AQFP, Quantum Logic, AI driven and efficient Data Layout for HBM | | | Depth-optimal Buffer and Splitter Insertion and Optimization in AQFP Circuits | 152 | | Area-driven FPGA Logic Synthesis Using Reinforcement Learning | 159 | | Optimization of Reversible Logic Networks with Gate Sharing | 166 | | Iris: Automatic Generation of Efficient Data Layouts for High Bandwidth Utilization | 172 | | Session 2E: University Design Contest | | |---------------------------------------------------------------------------------------------------------------------------------------------------|-----| | ViraEye: An Energy-Efficient Stereo Vision Accelerator with Binary Neural Network in 55 nm CMOS | 178 | | A 1.2nJ/Classification Fully Synthesized All-Digital Asynchronous Wired-Logic Processor Using Quantized Non-linear Function Blocks in 0.18µm CMOS | 180 | | A Fully Synthesized 13.7μJ/prediction 88% Accuracy CIFAR-10 Single-Chip Data-Reusing Wired-Logic Processor Using Non-Linear Neural Network | 182 | | A Multimode Hybrid Memristor-CMOS Prototyping Platform Supporting Digital and Analog Projects | 184 | | A fully synchronous digital LDO with built-in adaptive frequency modulation and implicit dead-zone control | 186 | | Demonstration of Order Statistics Based Flash ADC in a 65nm Process | 188 | | Session 3A: Synthesis of Quantum Circuits and Systems | | | A SAT Encoding for Optimal Clifford Circuit Synthesis | 190 | | An SMT-Solver-based Synthesis of NNA-Compliant Quantum Circuits Consisting of CNOT, H and T | | | Gates | 196 | | Compilation of Entangling Gates for High-Dimensional Quantum Systems | 202 | | WIT-Greedy: Hardware System Design of Weighted ITerative Greedy Decoder for Surface Code | 209 | | Quantum Data Compression for Efficient Generation of Control Pulses | 216 | | Session 3B: In-Memory/Near-Memory Computing for Neural Networks | | | Toward Energy-Efficient Sparse Matrix-Vector Multiplication with Near STT-MRAM Computing Architecture | 222 | | RIMAC: An Array-level ADC/DAC-free ReRAM-based In-Memory DNN Processor with Analog Cache and Computation | | | Crossbar-Aligned & Integer-Only Neural Network Compression for Efficient In-Memory Acceleration | 234 | | Discovering the In-Memory Kernels of 3D Dot-Product Engines | 240 | | RVComp: Analog Variation Compensation for RRAM-based In-Memory Computing | 246 | | Session 3D: Machine Learning-Based Design Automation | | | Rethink before Releasing your Model: ML Model Extraction Attack in EDA | 252 | | MacroRank: Ranking Macro Placement Solutions Leveraging Translation Equivariancy | 258 | | BufFormer: A Generative ML Framework for Scalable Buffering | 264 | | Decoupling Capacitor Insertion Minimizing IR-Drop Violations and Routing DRVs | 271 | | DPRoute: Deep Learning Framework for Package Routing | 277 | | Session 4A: Advanced Techniques for Yields, Low Power and Reliability | | | High Dimensional Yield Estimation using Shrinkage Deep Features and Maximization of Integral Entropy Reduction | 283 | | MIA-aware Detailed Placement and VT Reassignment for Leakage Power Optimization | | | 1.11.1 and Detailed I decine it at the assignment for Deakage I own Optimization | 270 | | SLOGAN: SDC Probability Estimation Using Structured Graph Attention Network | 296 | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------|------| | Session 4B: Microarchitectural Design and Neural Networks | | | Microarchitecture Power Modeling via Artificial Neural Network and Transfer Learning | 302 | | MUGNoC: A Software-configured Multicast-Unicast-Gather NoC for Accelerating CNN Dataflows . | 308 | | COLAB: Collaborative and Efficient Processing of Replicated Cache Requests in GPU | 314 | | Session 4C: Novel Techniques for Scheduling and Memory Optimizations in Embedded Software | | | Mixed-Criticality with Integer Multiple WCETs and Dropping Relations: New Scheduling Challenges | 320 | | An Exact Schedulability Analysis for Global Fixed-Priority Scheduling of the AER Task Model | 326 | | Skyrmion Vault: Maximizing Skyrmion Lifespan for Enabling Low-Power Skyrmion Racetrack Memory | y333 | | Session 4D: Efficient Circuit Simulation and Synthesis for Analog Designs | | | Parallel Incomplete LU Factorization Based Iterative Solver for Fixed-Structure Linear Equations in Circuit Simulation | 339 | | Accelerated Capacitance Simulation of 3-D Structures With Considerable Amounts of General Floating Metals | 346 | | On Automating Finger-Cap Array Synthesis with Optimal Parasitic Matching for Custom SAR ADC . | 352 | | Session 5A: Security of Heterogeneous Systems Containing FPGAs | | | FPGANeedle: Precise Remote Fault Attacks from FPGA to CPU | 358 | | FPGA Based Countermeasures against Side channel Attacks on Block Ciphers | 365 | | Session 5B: Novel Application & Architecture-Specific Quantization Techniques | | | Block-Wise Dynamic-Precision Neural Network Training Acceleration via Online Quantization Sensitivity Analytics | 372 | | Quantization Through Search: A Novel Scheme to Quantize Convolutional Neural Networks in Finite Weight Space | | | Multi-Wavelength Parallel Training and Quantization-Aware Tuning for WDM-Based Optical Convolutional Neural Networks Considering Wavelength-Relative Deviations | | | Semantic Guided Fine-grained Point Cloud Quantization Framework for 3D Object Detection | | | Session 5C: Approximate Brain-Inspired Architectures for Efficient Learning | | | ReMeCo: Reliable Memristor-Based In-Memory Neuromorphic Computation | 396 | | SyFAxO-GeN: Synthesizing FPGA-based Approximate Operators with Generative Networks | 402 | | Approximating HW Accelerators through Partial Extractions onto Shared Artificial Neural Networks . | 410 | | DependableHD: A Hyperdimensional Learning Framework for Edge-oriented Voltage-scaled Circuits | 416 | | Session 5D: Retrospect and Prospect of Verifiation and Test Technologies | | |-----------------------------------------------------------------------------------------------------------------|-----| | EDDY: A Multi-Core BDD Package With Dynamic Memory Management and Reduced Fragmentation | 423 | | Exploiting Reversible Computing for Verification: Potential, Possible Paths, and Consequences | 429 | | Automatic Test Pattern Generation and Compaction for Deep Neural Networks | 436 | | Wafer-Level Characteristic Variation Modeling Considering Systematic Discontinuous Effects | 442 | | Session 6A: Computing, Erasing, and Protecting: the Security Challenges for the Next Generation of Memories | | | $Hardware\ Security\ Primitives\ using\ Passive\ RRAM\ Crossbar\ Array:\ Novel\ TRNG\ and\ PUF\ Designs\ \ .$ | 449 | | Data Sanitization on eMMCs | 455 | | Fundamentally Understanding and Solving RowHammer | 461 | | Session 6B: System-Level Codesign in DNN Accelerators | | | Hardware-Software Codesign of DNN Accelerators using Approximate Posit Multipliers | 469 | | Reusing GEMM Hardware for Efficient Execution of Depthwise Separable Convolution on ASIC-based DNN Accelerators | 475 | | BARVINN: Arbitrary Precision DNN Accelerator Controlled by a RISC-V CPU | 483 | | Agile Hardware and Software Co-design for RISC-V-based Multi-precision Deep Learning Micropro- | | | cessor | 490 | | Session 6C: New Advances in Hardware Trojan Detection | | | Hardware Trojan Detection Using Shapley Ensemble Boosting | 496 | | ASSURER: A PPA-friendly Security Closure Framework for Physical Design | 504 | | Static Probability Analysis Guided RTL Hardware Trojan Test Generation | 510 | | Hardware Trojan Detection and High-Precision Localization in NoC-based MPSoC using Machine | | | Learning | 516 | | Session 6D: Advances in Physical Design and Timing Analysis | | | An Integrated Circuit Partitioning and TDM Assignment Optimization Framework for Multi-FPGA Systems | 522 | | A Robust FPGA Router with Concurrent Intra-CLB Rerouting | 529 | | Efficient Global Optimization for Large Scaled Ordered Escape Routing | 535 | | An Adaptive Partition Strategy of Galerkin Boundary Element Method for Capacitance Extraction | 541 | | Graph-Learning-Driven Path-Based Timing Analysis Results Predictor from Graph-Based Timing | | | Analysis | 547 | | Session 7A: Brain-inspired Hyperdimensional Computing to the Rescue for beyond von Neumann Era | | | Beyond von Neumann Era: Brain-inspired Hyperdimensional Computing to the Rescue | 553 | | Session 7B: System Level Design Space Exploration | | |----------------------------------------------------------------------------------------------------------------|-----| | System-Level Exploration of In-Package Wireless Communication for Multi-Chiplet Platforms | 561 | | Efficient System-Level Design Space Exploration for High-Level Synthesis using Pareto-Optimal Subspace Pruning | 567 | | Automatic Generation of Complete Polynomial Interpolation Design Space for Hardware Architectures | 573 | | Session 7C: Security Assurance and Acceleration | | | SHarPen: SoC Security Verification by Hardware Penetration Test | 579 | | SecHLS: Enabling Security Awareness in High-Level Synthesis | 585 | | A Flexible ASIC-oriented Design for a Full NTRU Accelerator | 591 | | Session 7D: Hardware and Software Co-design of Emerging Machine Learning Algorithm | ms | | Robust Hyperdimensional Computing Against Cyber Attacks and Hardware Errors: A Survey | 598 | | In-Memory Computing Accelerators for Emerging Learning Paradigms | 606 | | Toward Fair and Efficient Hyperdimensional Computing | 612 | | Session 8A: Full-Stack Co-design for On-Chip Learning in AI Systems | | | Improving the Robustness and Efficiency of PIM-based Architecture by SW/HW Co-design | 618 | | Hardware-Software Co-Design for On-Chip Learning in AI Systems | 624 | | Towards On-Chip Learning for Low Latency Reasoning with End-to-End Synthesis | 632 | | Session 8B: Energy-Efficient Computing for Emerging Applications | | | Knowledge Distillation in Quantum Neural Network using Approximate Synthesis | 639 | | NTGAT: A Graph Attention Network Accelerator with Runtime Node Tailoring | 645 | | A Low-Bitwidth Integer-STBP Algorithm for Efficient Training and Inference of Spiking Neural Net- | | | works | 651 | | TiC-SAT: Tightly-coupled Systolic Accelerator for Transformers | 657 | | Session 8C: Side-Channel Attacks and RISC-V Security | | | PMU-Leaker: Performance Monitor Unit-based Realization of Cache Side-Channel Attacks | 664 | | EO-Shield: A Multi-function Protection Scheme against Side Channel and Focused Ion Beam Attacks | 670 | | CompaSeC: A Compiler-assisted Security Countermeasure to Address Instruction Skip Fault Attacks on RISC-V | 676 | | Trojan-D2: Post-Layout Design and Detection of Stealthy Hardware Trojans - a RISC-V Case Study . | 683 | | Session 8D: Simulation and Verification of Quantum Circuits | | | Graph Partitioning Approach for Fast Quantum Circuit Simulation | 690 | | A Robust Approach to Detecting Non-equivalent Quantum Circuits Using Specially Designed Stimuli | 696 | | Quantum Algorithms | 2 | |-----------------------------------------------------------------------------------------------------------------|---| | Software Tools for Decoding Quantum Low-Density Parity Check Codes | 9 | | Session 9A: Learning x Security in DFM | | | Enabling Scalable AI Computational Lithography with Physics-Inspired Models | 5 | | Data-Driven Approaches for Process Simulation and Optical Proximity Correction | 1 | | Mixed-Type Wafer Failure Pattern Recognition | 7 | | Session 9B: Lightweight Models for Edge AI | | | Accelerating Convolutional Neural Networks in Frequency Domain via Kernel-sharing Approach 733 | 3 | | Mortar: Morphing the Bit Level Sparsity for General Purpose Deep Learning Acceleration 739 | 9 | | Data-Model-Circuit Tri-design for Ultra-light Video Intelligence on Edge Devices | 5 | | Latent Weight-based Pruning for Small Binary Neural Networks | 1 | | Session 9D: Design Automation for Emerging Devices | | | AutoFlex: Unified Evaluation and Design Framework for Flexible Hybrid Electronics | 7 | | CNFET7: An Open Source Cell Library for 7-nm CNFET Technology | 3 | | A Global Optimization Algorithm for Buffer and Splitter Insertion in Adiabatic Quantum-Flux-Parametron Circuits | 9 | | FLOW-3D: Flow-Based Computing on 3D Nanoscale Crossbars with Minimal Semiperimeter 775 | 5 |