# 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM 2021) Virtual Event 9 – 12 May 2021 **IEEE Catalog Number: ISBN:** CFP21054-POD 978-1-6654-0253-8 # Copyright © 2021 by the Institute of Electrical and Electronics Engineers, Inc. All Rights Reserved Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those articles in this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. For other copying, reprint or republication permission, write to IEEE Copyrights Manager, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08854. All rights reserved. \*\*\* This is a print representation of what appears in the IEEE Digital Library. Some format issues inherent in the e-media version may also appear in this print version. IEEE Catalog Number: CFP21054-POD ISBN (Print-On-Demand): 978-1-6654-0253-8 ISBN (Online): 978-1-6654-3555-0 ISSN: 2576-2613 #### Additional Copies of This Publication Are Available From: Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA Phone: (845) 758-0400 Fax: (845) 758-2633 E-mail: curran@proceedings.com Web: www.proceedings.com # 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) FCCM 2021 #### **Table of Contents** | Organizing Committee xv. Program Committee xvi. Subreviewers xviii | | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------| | Session 1: FPGA CAD | | | XBERT: Xilinx Logical-Level Bitstream Embedded RAM Transfusion .1 | | | A Safari through FPGA-Based Neural Network Compilation and Design Automatic<br>Patrick Plagwitz (Friedrich-Alexander Universität Erlangen-Nürnberg<br>(FAU), Germany), Frank Hannig (Friedrich-Alexander Universität<br>Erlangen-Nürnberg (FAU), Germany), Martin Ströbel (Schaeffler<br>Technologies AG & Co. KG, Germany), Christoph Strohmeyer (Schaeffler<br>Technologies AG & Co. KG, Germany), and Jürgen Teich<br>(Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany) | n Flows .1.0 | | Flexible Instrumentation for Live On-chip Debug of Machine Learning Training on Daniel Holanda Noronha (University of British Columbia), Zhiqiang Que (Imperial College London), Wayne Luk (Imperial College London), and Steve J.E. Wilton (University of British Columbia) | FPGAs .20 | # Session 2: Machine Learning 1 (Inference and Time-Series Prediction) BoostGCN: A Framework for Optimizing GCN Inference on FPGA .29... Bingyi Zhang (University of Southern California, USA), Rajgopal Kannan (US Army Research Lab, USA), and Viktor Prasanna (University of Southern California, USA) ## **Session 3: Applications 1 (Scientific Computing and Robotics)** Systematically Migrating an Operational Microphysics Parameterisation to FPGA Technology .6.9 James Stanley Targett (Imperial College London, UK), Wayne Luk (Imperial College London, UK), Michael Lange (European Centre for Medium-Range Weather Forecasts (ECMWF)), and Olivier Marsden (European Centre for Medium-Range Weather Forecasts (ECMWF)) Solving Large Top-K Graph Eigenproblems with a Memory and Compute-Optimized FPGA Design . 78 Francesco Sgherzi (Politecnico di Milano, Italy), Alberto Parravicini (Politecnico di Milano, Italy), Marco Siracusa (Politecnico di Milano, Italy), and Marco D. Santambrogio (Politecnico di Milano, Italy) ## **Session 4: Architecture** | Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAs .88 | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | GORDON: Benchmarking Optane DC Persistent Memory Modules on FPGAs .9.7 | | FANS: FPGA-Accelerated Near-Storage Sorting .1.0.6 | | Mocarabe: High-Performance Time-Multiplexed Overlays for FPGAs .1.15 | | Session 5: Applications 2 (Medical, Biology, Physics) | | HEDAcc: FPGA-Based Accelerator for High-Order Epistasis Detection .1.24. Gaspar Ribeiro (INESC-ID; Instituto Superior Técnico, Universidade de Lisboa, Portugal), Nuno Neves (INESC-ID; Instituto de Telecomunicações, Portugal), Sergio Santander-Jiménez (University of Extremadura, Spain), and Aleksandar Ilic (INESC-ID; Instituto Superior Técnico, Universidade de Lisboa, Portugal) | | The Importance of Being X-Drop: High Performance Genome Alignment on Reconfigurable Hardware 1.33 | | Marco Rabozzi (Huxelerate S.r.l.), and Marco D. Santambrogio<br>(Politecnico Di Milano, Italy) | | · | ## Session 6: Machine Learning 2 (CNNs) | Optimized FPGA-Based Deep Learning Accelerator for Sparse CNN using High Bandwidth Memory 15.7. | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Chao Jiang (University of Florida, USA), David Ojika (University of Florida, USA), Bhavesh Patel (Dell Technologies, USA), and Herman Lam (University of Florida, USA) | | unzipFPGA: Enhancing FPGA-Based CNN Engines with On-the-Fly Weights Generation .1.6.5 Stylianos I. Venieris (Samsung AI Center-Cambridge, UK), Javier Fernandez-Marques (University of Oxford, UK), and Nicholas D. Lane (Samsung AI Center, UK; University of Cambridge, UK) | | ESCA: Event-Based Split-CNN Architecture with Data-Level Parallelism on UltraScale+ FPGA .1.76 Pankaj Bhowmik (University of Florida, Gainesville, USA), Md Jubaer Hossain Pantho (University of Florida, Gainesville, USA), Joel Mandebi Mbongue (University of Florida, Gainesville, USA), and Christophe Bobda (University of Florida, Gainesville, USA) | | 3D-VNPU: A Flexible Accelerator for 2D/3D CNNs on FPGA .1.81 | | Session 7: High-Level Synthesis | | Clockwork: Resource-Efficient Static Scheduling for Multi-rate Image Processing Applications on FPGAs .1.86 | | Probabilistic Scheduling in High-Level Synthesis .1.95 Jianyi Cheng (Imperial College London, UK), John Wickerson (Imperial College London, UK), and George A. Constantinides (Imperial College London, UK) | | Extending High-Level Synthesis for Task-Parallel Programs .20.4. Yuze Chi (University of California, Los Angeles, USA), Licheng Guo (University of California, Los Angeles, USA), Jason Lau (University of California, Los Angeles, USA), Young-kyu Choi (Inha University), Jie Wang (University of California, Los Angeles, USA), and Jason Cong (University of California, Los Angeles, USA) | | HLS-Compatible, Embedded-Processor Stream Links .21.4.<br>Eric Micallef (University of Pennsylvania, USA), Yuanlong Xiao<br>(University of Pennsylvania, USA), and André DeHon (University of<br>Pennsylvania, USA) | | An Empirical Study of the Reliability of High-Level Synthesis Tools .219 | ## **Session 8: Security and Cloud Computing** Trusted Configuration in Cloud FPGAs .233.... Shaza Zeitouni (Technische Universität Darmstadt, Germany), Jo Vliegen (Katholieke Universiteit Leuven, Belgium), Tommaso Frassetto (Technische Universität Darmstadt, Germany), Dirk Koch (The University of Manchester, UK), Ahmad-Reza Sadeghi (Technische Universität Darmstadt, Germany), and Nele Mentens (Katholieke Universiteit Leuven, Belgium; Leiden University, The Netherlands) Remote Power Attacks on the Versatile Tensor Accelerator in Multi-tenant FPGAs .242..... Shanquan Tian (Yale University, USA), Shayan Moini (University of Massachusetts Amherst, USA), Adam Wolnikowski (Yale University, USA), Daniel Holcomb (University of Massachusetts Amherst, USA), Russell Tessier (University of Massachusetts Amherst, USA), and Jakub Szefer (Yale University, USA) Runtime Detection of Probing/Tampering on Interconnecting Buses .247...... Zhenyu Xu (University of Rhode Island, USA), Thomas Mauldin (University of Rhode Island, USA), Qing Yang (University of Rhode Island, USA), and Tao Wei (University of Rhode Island, USA) **Poster Papers** (Southeast University, China) A Tunable Dual-Edge Time-to-Digital Converter .253. Colin Drewes (University of California, San Diego), Steven Harris (University of California, San Diego), Winnie Wang (University of California, San Diego), Richard Appen (University of California, San Diego), Olivia Weng (University of California, San Diego), Ryan Kastner (University of California, San Diego), William Hunter (Georgia Tech Research Institute), Christopher McCarty (Georgia Tech Research Institute), and Dustin Richmond (University of Washington) Accelerating Large-Scale Nearest Neighbor Search with Computational Storage Device .254.... Ji-Hoon Kim (School of Electrical Engineering, KAIST, Republic of Korea), Yeo-Reum Park (School of Electrical Engineering, KAIST, Republic of Korea), Jaeyoung Do (Microsoft Research, USA), Soo-Young Ji (Memory Business, Samsung Electronics, Republic of Korea), and Joo-Young Kim (School of Electrical Engineering, KAIST, Republic of Korea) ARC: Reconfigurable Cache Security Assurance with Application-Specific Randomized Mapping in FPGA-Based Heterogeneous Computing .255. Sanjay Gandham (University of Central Florida, USA), Rakin Muhammad Shadab (University of Central Florida, USA), and Mingjie Lin (University of Central Florida, USA) AutoTEA: Automated Transistor-Level Efficient and Accurate Optimization for GRM FPGA Design .25.6. Yanze Li (State Key Lab of ASIC & System, School of Microelectronics, Fudan University, China), Yufan Zhang (State Key Lab of ASIC & System, School of Microelectronics, Fudan University, China), Jiafeng Liu (State Key Lab of ASIC & System, School of Microelectronics, Fudan University, China), Jian Wang (State Key Lab of ASIC & System, School of Microelectronics, Fudan University, China), Jinmei Lai (State Key Lab of ASIC & System, School of Microelectronics, Fudan University, China), and Gang Qu (University of Maryland, College Park, United States) Configurable Pipelined Datapath for Data Acquisition in Interventional Computed Tomography.... 257 Daniele Passaretti (Otto von Guericke University Magdeburg; Research Campus STIMULATE, Germany) and Thilo Pionteck (Otto von Guericke *University Magdeburg, Germany)* DMA Medusa: A Vendor-Independent FPGA-Based Architecture for 400 Gbps DMA Transfers .258 Jan Kubálek (CESNET a.l.e., Czech Republic), Jakub Cabal (CESNET a.l.e., Czech Republic), Martin Špinler (CESNET a.l.e., Czech Republic), and Radek Iša (CESNET a.l.e., Czech Republic) Edge Accelerator for Lifelong Deep Learning using Streaming Linear Discriminant Analysis .259 Duvindu Piyasena (Nanyang Technological University, Singapore), Siew-Kei Lam (Nanyang Technological University, Singapore), and Meiging Wu (Nanyang Technological University, Singapore) Enabling OpenMP Task Parallelism on Multi-FPGAs .260..... Ramon Nepomuceno (University of Campinas, Brazil), Renan Sterle (University of Campinas, Brazil), Guilherme Valarini (University of Campinas, Brazil), Marcio Pereira (University of Campinas, Brazil), Hervé Yviquel (University of Campinas, Brazil), and Guido Araujo (University of Campinas, Brazil) Extending HLS with High-Level Descriptive Language for Configurable Algorithm-Level Spatial Structure Design .261..... Chengyue Wang (Zhejiang University, China), Sitao Huang (University of Illinois, USA), Wen-Mei Hwu (University of Illinois, USA), and Deming Chen (University of Illinois, USA) FERMAT: FPGA-Accelerated Heterogeneous Computing Platform Near NVMe Storage .262..... Yu Zou (University of Central Florida) and Mingjie Lin (University of Central Florida) FFIVE: An FPGA Framework for Interactive VNF Environments .263..... Juan Camilo Vega (University of Toronto, Canada), Mohammad Ewais (University of Toronto, Canada), Alberto Leon Garcia (University of Toronto, Canada), and Paul Chow (University of Toronto, Canada) | Heterogeneous Dual-Core Overlay Processor for Light-Weight CNNs .264 | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Near-Storage Acceleration of Database Query Processing with SmartSSDs .265 | | NullaNet Tiny: Ultra-low-Latency DNN Inference Through Fixed-Function Combinational Logic 266 | | Mahdi Nazemi (University of Southern California, USA), Arash Fayyazi<br>(University of Southern California, USA), Amirhossein Esmaili<br>(University of Southern California, USA), Atharva Khare (University of<br>Southern California, USA), Soheil Nazar Shahsavani (University of<br>Southern California, USA), and Massoud Pedram (University of Southern<br>California, USA) | | ONT-X: An FPGA Approach to Real-Time Portable Genomic Analysis .268 | | Particle Mesh Ewald for Molecular Dynamics in OpenCL on an FPGA Cluster .270 | | Pharos: A Performance Monitor for Multi-FPGA Systems .27.1 | | Reconfigurable Synthesizable Synchronization FIFOs .272 | | Scalable FPGA Median Filtering via a Directional Median Cascade .273 | | Scheduling Persistent and Fully Cooperative Instructions .27.4 | | TOCO: A Systolic Network for Efficient Transposed Convolutions with Output-Reuse Paths .275 Zhengzheng Ma (Peking University) and Guojie Luo (Peking University) | | TwinDNN: A Tale of Two Deep Neural Networks .276 | | Using his4mi to Map Convolutional Neural Networks on Interconnected FPGA Devices .27.7 Evangelos Mageiropoulos (Foundation for Research and Technology Hellas (FORTH), Greece), Nikolaos Chrysos (Foundation for Research and Technology Hellas (FORTH), Greece), Nikolaos Dimou (Foundation for Research and Technology Hellas (FORTH), Greece), and Manolis Katevenis (Foundation for Research and Technology Hellas (FORTH), Greece) | | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----| | An FPGA Based Hardware Accelerated Framework for Solar Spectra Matching with Parameterize Matched Filter IP Core .278 | ec | | Time-Domain FPGA Power Delivery Network Characterization Methodology .279<br>Yanran P. Chen (Xilinx, Inc., USA), Martin L. Voogel (Xilinx, Inc.,<br>USA), Ed Priest (Xilinx, Inc., USA), Qian Wang (Xilinx, Inc., USA),<br>Ranjeeth Doppalapudi (Xilinx, Inc., USA), and Praful Jain (Xilinx,<br>Inc., USA) | | | Author Index 281 | |