## **2021 International Conference** on Field-Programmable **Technology (ICFPT 2021)** **Virtual Conference** 6-10 December 2021 **IEEE Catalog Number: CFP21528-POD ISBN**: 978-1-6654-2011-2 ## Copyright © 2021 by the Institute of Electrical and Electronics Engineers, Inc. All Rights Reserved Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those articles in this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. For other copying, reprint or republication permission, write to IEEE Copyrights Manager, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08854. All rights reserved. \*\*\* This is a print representation of what appears in the IEEE Digital Library. Some format issues inherent in the e-media version may also appear in this print version. IEEE Catalog Number: CFP21528-POD ISBN (Print-On-Demand): 978-1-6654-2011-2 ISBN (Online): 978-1-6654-2010-5 ## **Additional Copies of This Publication Are Available From:** Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA Phone: (845) 758-0400 Fax: (845) 758-2633 E-mail: curran@proceedings.com Web: www.proceedings.com ## **Table of Contents** | Machine Learning | | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------|----| | LETA: A Lightweight Exchangeable-Track Accelerator for EfficientNet Based on FPGA Jingbo Gao, Yu Qian, Yihan Hu, Xitian Fan, Wai-Shing Luk, Wei Cao and Lingli Wang | 1 | | Efficient Stride 2 Winograd Convolution Method Using Unified Transformation Matrices on FPGA | 10 | | Dong Optimizing Bayesian Recurrent Neural Networks on an FPGA-based Accelerator Martin Ferianc, Zhiqiang Que, Hongxiang Fan, Wayne Luk and Miguel Rodrigues | 19 | | A High-Performance and Flexible FPGA Inference Accelerator for Decision Forests Based on Prior Feature Space Partitioning | 29 | | Machine Learning (Short Papers) | | | Low Precision Networks for Efficient Inference on FPGAs | 39 | | High Level Synthesis and Electronic Design Automation | | | FLOWER: A Comprehensive Dataflow Compiler for High-Level Synthesis | 44 | | High Level Synthesis and Electronic Design Automation (Short Papers) | | | AMAH-Flex: A Modular and Highly Flexible Tool for Generating Relocatable Systems on FPGAs | 53 | | Profiling-Based Control-Flow Reduction in High-Level Synthesis | 59 | | On the Performance Effect of Loop Trace Window Size on Scheduling for Configurable Coarse Grain Loop Accelerators | 65 | | Mathematical Computations and Digital Signal Processing | | | StreamSVD: Low-rank Approximation and Streaming Accelerator Co-design | 69 | | High Performance Lattice Regression on FPGAs via a High Level Hardware Description Language | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Nathan Zhang, Matthew Feldman and Kunle Olukotun | | Mathematical Computations and Digital Signal Processing (Short Papers) | | Dataflow Systolic Array Implementations of Exploring Dual-Triangular Structure in QR Decomposition Using High Level Synthesis | | Exponential Sine Sweep Measurement Implementation Targeting FPGA Platforms 92 Alexander Klemd, Patrick Nowak, Piero Rivera Benois, Etienne Gerat, Bernd Klauer and Udo Zölzer | | Parallel-Pipeline Fast Walsh-Hadamard Transform Implementation Using HLS | | FPGAs as General-Purpose Accelerators for Non-Experts via HLS: The Graph Analysis Example | | Pedro Filipe Silva, João Bispo and Nuno Paulino | | Image Processing and Computer Vision | | ac^2SLAM: FPGA Accelerated High-Accuracy SLAM with Heapsort and Parallel Keypoint Extractor | | A Streaming Hardware Architecture for Real-Time SIFT Feature Extraction | | A Unified Accelerator Design for LiDAR SLAM Algorithms for Low-end FPGAs | | Algorithm-Hardware Co-Optimization for Energy-Efficient Drone Detection on Resource-Constrained FPGA | | Image Processing and Computer Vision (Short Papers) | | Energy-efficient FPGA-accelerated LiDAR-based SLAM for Embedded Robotics | | Architecture and Devices | | General Routing Architecture Modelling and Exploration for Modern FPGAs | | FastCGRA: A Modeling, Evaluation, and Exploration Platform for Large-Scale Coarse-Grained Reconfigurable Arrays | | APIR-DSP: An Approximate PIR-DSP Architecture for Error-Tolerant Applications 16 Yuan Dai, Simin Liu, Yao Lu, Hao Zhou, Seyedramin Rasoulinezhad, Philip H.W. Leong and Lingli Wang | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Architecture and Devices (Short Papers) | | Characterization of IOBUF-based Ring Oscillators | | A Hexagon-Based Honeycomb Routing Architecture for FPGA | | Networks and Data Management | | Increasing Memory Efficiency of Hash-Based Pattern Matching for High-Speed Networks . 18 Tomáš Fukač, Jiří Matoušek, Jan Kořenek and Lukáš Kekely | | Tens of gigabytes per second JSON-to-Arrow conversion with FPGA accelerators | | StreamZip: Compressed Sliding-Windows for Stream Aggregation | | Scalable and Flexible High-Performance In-Network Processing of Hash Joins in Distributed Databases | | Networks and Data Management (Short Papers) | | Efficient Queue-Balancing Switch for FPGAs | | Systems and Security | | Efficient Physical Page Migrations in Shared Virtual Memory Reconfigurable Computing Systems | | State<br>Link: FPGA System Debugging via Flexible Simulation/Hardware Integration<br>$$ 23 $Sameh\ Attia\ and\ Vaughn\ Betz$ | | Systems and Security (Short Papers) | | In-Storage Computation of Histograms with Differential Privacy | | Physical Computations | | High-Performance Hardware Implementation of CRYSTALS-Dilithium | | A Modular RFSoC-based Approach to Interface Superconducting Quantum Bits | | An Efficient RTL Buffering Scheme for an FPGA-Accelerated Simulation of Diffuse Radiative Transfer | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Kazuki Furukawa, Tomoya Yokono, Yoshiki Yamaguchi, Kohji Yoshikawa, Norihisa<br>Fujita, Ryohei Kobayashi, Taisuke Boku and Masayuki Umemura | | Physical Computations (Short Papers) | | Total-ionizing-dose tolerance evaluation of an optoelectronic field programmable gate array VLSI in operation | | PhD Forum papers | | A High-Precision Flexible Symmetry-Aware Architecture for Element-Wise Activation Functions | | High-performance pipeline architecture for packet classification accelerator in DPU 286 Jing Tan, Gaofeng Lv, Yanni Ma and Guanjie Qiao | | Parallelized Technology Mapping to General PLBs by Adaptive Circuit Partitioning 290 Xiaoxi Wang, Moucheng Yang, Zhen Li and Lingli Wang | | Resource-saving FPGA Implementation of the Satisfiability Problem Solver: AmoebaSATslim | | An area-efficient multiply-accumulation architecture and implementations for time-domain neural processing | | Real-time Implementation of Cyclostationary Analysis using FPGAs | | Design Competition papers | | An Autonomous Driving System Utilizing Image Processing Accelerated by FPGA 308 Kazunari Takasaki, Kota Hisafuru, Ryotaro Negishi, Kazuki Yamashita, Keisuke Fukada, Tomoya Wakaizumi and Nozomu Togawa | | An FPGA-based Image Recognition with Remote Update Functions for Autonomous Driving on "ad-refkit" | | SoC FPGA Implementation of an Unmanned Mobile Vehicle with an Image Transmission System over VNC | | Zytlebot : FPGA Integrated ROS-Based Autonomous Mobile Robot | . 319 | |---------------------------------------------------------------------------------------------|-------| | Autonomous Driving System implemented on Robot Car using SoC FPGA | 323 | | Development of Autonomous Driving System based on Image Recognition using Programmable SoCs | 327 | | A Dataset Generation for Object Recognition and a Tool for Generating ROS2 FPGA Node | . 331 | | Fast Controlling Autonomous Vehicle Based on Real Time Image Processing | 335 |