# 2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC 2021) Virtual Conference 18 – 21 January 2021 IEEE Catalog Number: CFP21ASP-POD ISBN: 978-1-7281-8057-1 # Copyright © 2021, Association for Computing Machinery (ACM) All Rights Reserved \*\*\* This is a print representation of what appears in the IEEE Digital Library. Some format issues inherent in the e-media version may also appear in this print version. IEEE Catalog Number: CFP21ASP-POD ISBN (Print-On-Demand): 978-1-7281-8057-1 ISBN (Online): 978-1-4503-7999-1 ISSN: 2153-6961 ### Additional Copies of This Publication Are Available From: Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA Phone: (845) 758-0400 Fax: (845) 758-2633 E-mail: curran@proceedings.com Web: www.proceedings.com # **ASP-DAC 2021** # Contents | Front Matter | | |----------------------------------------------------------------------------------------------------------------------------------|----| | Welcome to ASP-DAC 2021 | X | | Message from the Technical Program Committee | X | | Organizing Committee | κi | | Technical Program Committee | ii | | University LSI Design Contest Committee | V | | Designers' Forum Committee | vi | | Steering Committee | ii | | University LSI Design Contest | ii | | Designers' Forum | X | | ACM SIGDA Student Research Forum at ASP-DAC 2021 | X | | Best Paper Award | κi | | University LSI Design Contest Award | ii | | 10-Year Retrospective Most Influential Paper Award | V | | Keynote Addresses | V | | Invitation to ASP-DAC 2022 | ii | | List of Reviewers (Regular topic) | ii | | List of Reviewers (UDC) | X | | Technical Program | | | Session 1A: University Design Contest I | | | A DSM-based Polar Transmitter with 23.8% System Efficiency | 1 | | A 0.41W 34Gb/s 300GHz CMOS Wireless Transceiver | 3 | | Capacitive Sensor Circuit with Relative Slope-Boost Method Based on a Relaxation Oscillator | 5 | | 28GHz Phase Shifter with Temperature Compensation for 5G NR Phased-array Transceiver | 7 | | An up to 35 dBc/Hz Phase Noise Improving Design Methodology for Differential-Ring-Oscillators Applied in Ultra-Low Power Systems | 9 | | Gate Voltage Optimization in Capacitive DC-DC Converters for Thermoelectric Energy Harvesting 1 | 1 | | An 0.57 GOPS/DSP Object Detection PIM Acceleratoron FPGA | 3 | | Supply Noise Reduction Filter for Parallel Integrated Transimpedance Amplifiers | 5 | | Session 1B: Accelerating Design and Simulation | | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----| | A Fast Yet Accurate Message-level Communication Bus Model for Timing Prediction of SDFGs on MPSoC | 17 | | Simulation of Ideally Switched Circuits in SystemC | 23 | | HW-BCP: A Custom Hardware Accelerator for SAT Suitable for Single Chip Implementation for Large Benchmarks | 29 | | Session 1C: Process-in-Memory for Efficient and Robust AI | | | A Novel DRAM-Based Process-in-Memory Architecture and its Implementation for CNNs | 35 | | A Quantized Training Framework for Robust and Accurate ReRAM-based Neural Network Accelerators | 43 | | Attention-in-Memory for Few-Shot Learning with Configurable Ferroelectric FET Arrays | 49 | | Session 1D: Validation and Verification | | | Mutation-based Compliance Testing for RISC-V | 55 | | A General Equivalence Checking Framework for Multivalued Logic | 61 | | ATLaS: Automatic Detection of Timing-based Information Leakage Flows for SystemC HLS Designs | 67 | | Session 1E: Design Automation Methods for Various Microfluidic Platforms | | | A multi-commodity network flow based routing algorithm for paper-based digital microfluidic biochips | 73 | | Interference-free Design Methodology for Paper-Based Digital Microfluidic Biochips | 79 | | Accurate and Efficient Simulation of Microfluidic Networks | 85 | | Session 2A: University Design Contest II | | | A 65nm CMOS Process Li-ion Battery Charging Cascode SIDO Boost Converter with 89% Maximum Efficiency for RF Wireless Power Transfer Receiver | 91 | | A High Accuracy Phase and Amplitude Detection Circuit for Calibration of 28GHz Phased Array Beamformer System | 93 | | A Highly Integrated Energy-efficient CMOS Millimeter-wave Transceiver with Direct-modulation Digital Transmitter, Quadrature Phased-coupled Frequency Synthesizer and Substrate-Integrated Waveguide E-shaped Patch Antenna | 95 | | A 3D-Stacked SRAM Using Inductive Coupling Technology for AI Inference Accelerator in 40-nm CMOS | 97 | | Sub-10-μm Coil Design for Multi-Hop Inductive Coupling Interface | 99 | | Current-Starved Chaotic Oscillator Over Multiple Frequency Decades on Low-Cost CMOS | 101 | | TCI tester: Tester for Through Chip Interface | 103 | | An 18 Bit Time-to-Digital Converter Design with Large Dynamic Range and Automated Multi-Cycle Concept | 105 | | | | | Session 2B: Emerging Non-Volatile Processing-In-Memory for Next Generation Compu | ting | |----------------------------------------------------------------------------------------------------------------------------------------------|-------| | Connection-based Processing-In-Memory Engine Design Based on Resistive Crossbars | 107 | | FePIM: Contention-Free In-Memory Computing Based on Ferroelectric Field-Effect Transistors | 114 | | RIME: A Scalable and Energy-Efficient Processing-In-Memory Architecture for Floating-Point Operations | 120 | | A Non-Volatile Computing-In-Memory Framework With Margin Enhancement Based CSA and Offset Reduction Based ADC | 126 | | Session 2C: Emerging Trends for Cross-Layer Co-Design: From Device, Circuit, to Artecture, Application | chi- | | Cross-layer Design for Computing-in-Memory: From Devices, Circuits, to Architectures and Applications | 132 | | Session 2D: Machine Learning Techniques for EDA in Analog/Mixed-Signal ICs | | | Automatic Surrogate Model Generation and Debugging of Analog/Mixed-Signal Designs Via Collaborative Stimulus Generation and Machine Learning | 140 | | A Robust Batch Bayesian Optimization for Analog Circuit Synthesis via Local Penalization | 146 | | Layout Symmetry Annotation for Analog Circuits with Graph Neural Networks | 152 | | Fast and Efficient Constraint Evaluation of Analog Layout using Machine Learning Models | 158 | | Session 2E: Innovating Ideas in VLSI Routing Optimization | | | TreeNet: Deep Point Cloud Embedding for Routing Tree Construction | 164 | | A Unified Printed Circuit Board Routing Algorithm With Complicated Constraints and Differential Pair | s 170 | | Multi-FPGA Co-optimization: Hybrid Routing and Competitive-based Time Division Multiplexing Assignment | 176 | | Boosting Pin Accessibility Through Cell Layout Topology Diversification | 183 | | Session 3A: ML-Driven Approximate Computing | | | Approximate Computing for ML: State-of-the-art, Challenges and Visions | 189 | | Session 3B: Architecture-Level Exploration | | | Bridging the Frequency Gap in Heterogeneous 3D SoCs through Technology-Specific NoC Router Architectures | 197 | | Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs | 204 | | A Dynamic Link-latency Aware Cache Replacement Policy (DLRP) | 210 | | Prediction of Register Instance Usage and Time-sharing Register for Extended Register Reuse Scheme | | | Session 3C: Core Circuits for AI Accelerators | | | Residue-Net: Multiplication-free Neural Network by In-situ, No-loss Migration to Residue Number Systems | 222 | | A Multiple-Precision Multiply and Accumulation Design with Multiply-Add Merged Strategy for Accelerating | | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------| | DeepOpt: Optimized Scheduling of CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workloads for ASIC-based Systolic Deep Learning According to the CNN Workload Systolic Deep Learning According to the CNN Workload Systolic Deep Learning According to the CNN Workload Systolic Deep Learning According to the CNN Workload Systolic Deep Learning According to the CNN Wor | el- | | erators | 235 | | Value-Aware Error Detection and Correction for SRAM Buffers in Low-Bitwidth, Floating-Point CN Accelerators | | | Accelerations | 212 | | Session 3D: Stochastic and Approximate Computing | | | MIPAC: Dynamic Input-Aware Accuracy Control for Dynamic Auto-Tuning of Iterative Approxima | ate | | Computing | 248 | | Normalized Stability: A Cross-Level Design Metric for Early Termination in Stochastic Computing | . 254 | | Zero Correlation Error: A Metric for Finite-Length Bitstream Independence in Stochastic Computing | ng 260 | | An Efficient Approximate Node Merging with an Error Rate Guarantee | 266 | | Session 3E: Timing Analysis and Timing-Aware Design | | | An Adaptive Delay Model for Timing Yield Estimation under Wide-Voltage Range | 272 | | ATM: A High Accuracy Extracted Timing Model for Hierarchical Timing Analysis | 278 | | Mode-wise Voltage-scalable Design with Activation-aware Slack Assignment for Energy Minimiza | tion 284 | | A Timing Prediction Framework for Wide Voltage Design with Data Augmentation Strategy | 291 | | Session 4A: Technological Advancements inside the AI chips, and using the AI Chip | s | | Energy-Efficient Deep Neural Networks with Mixed-Signal Neurons and Dense-Local and Spars | | | Global Connectivity | | | Merged Logic and Memory Fabrics for AI Workloads | 305 | | Vision Control Unit in Fully Self Driving Vehicles using Xilinx MPSoC and Opensource Stack | 311 | | Session 4B: System-Level Modeling, Simulation, and Exploration | | | Constrained Conservative State Symbolic Co-analysis for Ultra-low-power Embedded Systems | 318 | | Arbitrary and Variable Precision Floating Point Arithmetic Support in Dynamic Binary Translation | 325 | | Optimizing Temporal Decoupling using Event Relevance | 331 | | Design Space Exploration of Heterogeneous-Accelerator SoCs with Hyperparameter Optimization | 338 | | Session 4C: Neural Network Optimizations for Compact AI Inference | | | | | | DNR: A Tunable Robust Pruning Framework Through Dynamic Network Rewiring of DNNs | 344 | | DNR: A Tunable Robust Pruning Framework Through Dynamic Network Rewiring of DNNs Dynamic Programming Assisted Quantization Approaches for Compressing Normal and Robust DN | | | | IN | | Dynamic Programming Assisted Quantization Approaches for Compressing Normal and Robust DN | NN<br>351 | | Session 4D: Brain-Inspired Computing | | |---------------------------------------------------------------------------------------------------------------------------------------|------| | Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators | 372 | | A reduced-precision streaming SpMV architecture for Personalized PageRank on FPGA | 378 | | HyperRec: Efficient Recommender Systems with Hyperdimensional Computing | 384 | | Efficient Techniques for Training the Memristor-based Spiking Neural Networks Targeting Better Speed, Energy and Lifetime | 390 | | Session 4E: Cross-Layer Hardware Security | | | PCBench: Benchmarking of Board-Level Hardware Attacks and Trojans | 396 | | Cache-Aware Dynamic Skewed Tree for Fast Memory Authentication | 402 | | Automated Test Generation for Hardware Trojan Detection using Reinforcement Learning | 408 | | On the Impact of Aging on Power Analysis Attacks Targeting Power-Equalized Cryptographic Circuits | 414 | | Session 5B: Embedded Operating Systems and Information Retrieval | | | Energy-Performance Co-Management of Mixed-Sensitivity Workloads on Heterogeneous Multi-core Systems | 421 | | Optimizing Inter-Core Data-Propagation Delays in Industrial Embedded Systems under Partitioned Scheduling | 428 | | LiteIndex: Memory-Efficient Schema-Agnostic Indexing for JSON documents in SQLite | | | Session 5C: Security Issues in AI and Their Impacts on Hardware Security | | | Micro-architectural Cache Side-Channel Attacks and Countermeasures | 441 | | Security of Neural Networks from Hardware Perspective: A Survey and Beyond | 449 | | Learning Assisted Side Channel Delay Test for Detection of Recycled ICs | 455 | | ML-augmented Methodology for Fast Thermal Side-Channel Emission Analysis | 463 | | Session 5D: Advances in Logic and High-level Synthesis | | | 1 <sup>st</sup> -Order to 2 <sup>nd</sup> -Order Threshold Logic Gate Transformation with an Enhanced ILP-based Identification Method | 469 | | A Novel Technology Mapper for Complex Universal Gates | 475 | | High-Level Synthesis of Transactional Memory | 481 | | Session 5E: Hardware-Oriented Threats and Solutions in Neural Networks | | | VADER: Leveraging the Natural Variation of Hardware to Enhance Adversarial Attack | 487 | | Entropy-Based Modeling for Estimating Adversarial Bit-flip Attack Impact on Binarized Neural Network | k493 | | A Low Cost Weight Obfuscation Scheme for Security Enhancement of ReRAM Based Neural Network Accelerators | 499 | | Session 6B: Advanced Optimizations for Embedded Systems | | |----------------------------------------------------------------------------------------------------------------------|-----| | Puncturing the memory wall: Joint optimization of network compression with approximate memory for ASR application | 505 | | Canonical Huffman Decoder on Fine-grain Many-core Processor Arrays | | | A Decomposition-Based Synthesis Algorithm for Sparse Matrix-Vector Multiplication in Parallel | | | Communication Structure | 518 | | Session 6C: Design and Learning of Logic Circuits and Systems | | | Learning Boolean Circuits from Examples for Approximate Logic Synthesis | 524 | | Read your Circuit: Leveraging Word Embedding to Guide Logic Optimization | 530 | | Exploiting HLS-Generated Multi-Version Kernels to Improve CPU-FPGA Cloud Systems | 536 | | Session 6D: Hardware Locking and Obfuscation | | | Area Efficient Functional Locking through Coarse Grained Runtime Reconfigurable Architectures | 542 | | ObfusX: Routing Obfuscation with Explanatory Analysis of a Machine Learning Attack | 548 | | Breaking Analog Biasing Locking Techniques via Re-Synthesis | 555 | | Session 6E: Efficient Solutions for Emerging Technologies | | | Energy and QoS-Aware Dynamic Reliability Management of IoT Edge Computing Systems | 561 | | Light: A Scalable and Efficient Wavelength-Routed Optical Networks-On-Chip Topology | 568 | | One-Pass Synthesis for Field-coupled Nanocomputing Technologies | 574 | | Session 7A: Platform-Specific Neural Network Acceleration | | | Real-Time Mobile Acceleration of DNNs: From Computer Vision to Medical Applications | 581 | | Dynamic Neural Network to Enable Run-Time Trade-off between Accuracy and Latency | 587 | | When Machine Learning Meets Quantum Computer: Network-Circuit Co-Design via Quantum-Aware Neural Architecture Search | 593 | | Improving Efficiency in Neural Network Accelerator using Operands Hamming Distance Optimization | | | Lightweight Run-Time Working Memory Compression for Deployment of Deep Neural Networks on | | | Resource-Constrained MCUs | 607 | | Session 7B: Toward Energy-Efficient Embedded Systems | | | EHDSktch: A Generic Low Power Architecture for Sketching in Energy Harvesting Devices | 615 | | Energy-Aware Design Methodology for Myocardial Infarction Detection on Low-Power Wearable Devices | 621 | | Power-Efficient Layer Mapping for CNNs on Integrated CPU and GPU Platforms: A Case Study | 627 | | A Write-friendly Arithmetic Coding Scheme for Achieving Energy-Efficient Non-Volatile Memory | | | Systems | 633 | | Session 7C: Software and System Support for Nonvolatile Memory | | |------------------------------------------------------------------------------------------------------------------|---| | DP-Sim: A Full-stack Simulation Infrastructure for Digital Processing In-Memory Architectures 639 | 9 | | SAC: A Stream Aware Write Cache Scheme for Multi-Streamed Solid State Drives | 5 | | Providing Plug N' Play for Processing-in-Memory Accelerators | 1 | | Aging Aware Request Scheduling for Non-Volatile Main Memory | 7 | | Session 7D: Learning-Driven VLSI Layout Automation Techniques | | | Placement for Wafer-Scale Deep Learning Accelerator | 5 | | Net <sup>2</sup> : A Graph Attention Network Method Customized for Pre-Placement Net Length Estimation 67 | 1 | | Machine Learning-based Structural Pre-route Insertability Prediction and Improvement with Guided Backpropagation | 8 | | Standard Cell Routing with Reinforcement Learning and Genetic Algorithm in Advanced Technology Nodes | | | Session 7E: DNN-Based Physical Analysis and DNN Accelerator Design | | | Thermal and IR Drop Analysis Using Convolutional Encoder-Decoder Networks 690 | 0 | | GRA-LPO: Graph Convolution Based Leakage Power Optimization | 7 | | DEF: Differential Encoding of Featuremaps for Low Power Convolutional Neural Network Accelerators 70. | 3 | | Temperature-Aware Optimization of Monolithic 3D Deep Neural Network Accelerators | 9 | | Session 8B: Embedded Neural Networks and File Systems | | | Gravity: An Artificial Neural Network Compiler for Embedded Applications | 5 | | A Self-Test Framework for Detecting Fault-induced Accuracy Drop in Neural Network Accelerators . 72 | 2 | | Facilitating the Efficiency of Secure File Data and Metadata Deletion on SMR-based Ext4 File System 72 | 8 | | Session 8C: Design Automation for Future Autonomy | | | Efficient Computing Platform Design for Autonomous Driving Systems | 4 | | On Designing Computing Systems for Autonomous Vehicles: a PerceptIn Case Study | 2 | | Runtime Software Selection for Adaptive Automotive Systems | 8 | | Safety-Assured Design and Adaptation of Learning-Enabled Autonomous Systems | 3 | | Session 8D: Emerging Hardware Verification | | | System-Level Verification of Linear and Non-Linear Behaviors of RF Amplifiers using Metamorphic Relations | 1 | | Random Stimuli Generation for the Verification of Quantum Circuits | 7 | | Exploiting Extended Krylov Subspace for the Reduction of Regular and Singular Circuit Models 77. | 3 | | Session 8E: Optimization and Mapping Methods for Quantum Technologies | | | Algebraic and Boolean Optimization Methods for AQFP Superconducting Circuits | 9 | | Dynamical Decomposition and Mapping of MPMCT Gates to Nearest Neighbor Architectures | 786 | |--------------------------------------------------------------------------------------------------------------------------------|-----| | Exploiting Quantum Teleportation in Quantum Circuit Mapping | 792 | | Session 9B: Emerging System Architectures for Edge-AI | | | Hardware-Aware NAS Framework with Layer Adaptive Scheduling on Embedded System | 798 | | Dataflow-Architecture Co-Design for 2.5D DNN Accelerators using Wireless Network-on-Package . | 806 | | Block-Circulant Neural Network Accelerator Featuring Fine-Grained Frequency-Domain Quantization and Reconfigurable FFT Modules | 813 | | BatchSizer: Power-Performance Trade-off for DNN Inference | 819 | | Session 9C: Cutting-Edge EDA Techniques for Advanced Process Technologies | | | Deep Learning for Mask Synthesis and Verification: A Survey | 825 | | Physical Synthesis for Advanced Neural Network Processors | 833 | | Advancements and Challenges on Parasitic Extraction for Advanced Process Technologies | 841 | | Session 9D: Robust and Reliable Memory Centric Computing at Post-Moore | | | Reliability-Aware Training and Performance Modeling for Processing-In-Memory Systems | 847 | | $Robustness\ of\ Neuromorphic\ Computing\ with\ RRAM-based\ Crossbars\ and\ Optical\ Neural\ Networks\ .$ | 853 | | Uncertainty Modeling of Emerging Device based Computing-in-Memory Neural Accelerators with | | | Application to Neural Architecture Search | | | A Physical-Aware Framework for Memory Network Design Space Exploration | 865 | | Session 9E: Design for Manufacturing and Soft Error Tolerance | | | Manufacturing-Aware Power Staple Insertion Optimization by Enhanced Multi-Row Detailed Placement Refinement | 872 | | A Hierarchical Assessment Strategy on Soft Error Propagation in Deep Learning Controller | 878 | | Attacking a CNN-based Layout Hotspot Detector Using Group Gradient Method | 885 | | Bayesian Inference on Introduced General Region: An Efficient Parametric Yield Estimation Method for Integrated Circuits | 892 | | Analog IC Aging-induced Degradation Estimation via Heterogeneous Graph Convolutional Networks | 898 |