# 2015 International Conference on Field Programmable Technology (FPT 2015)

Queenstown, New Zealand 7 – 9 December 2015



IEEE Catalog Number: CFP15528-POD ISBN: 978-1-4673-9092-7

## Copyright © 2015 by the Institute of Electrical and Electronic Engineers, Inc **All Rights Reserved**

Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those articles in this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.

For other copying, reprint or republication permission, write to IEEE Copyrights Manager, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08854. All rights reserved.

\*\*\*This publication is a representation of what appears in the IEEE Digital Libraries. Some format issues inherent in the e-media version may also appear in this print version.

IEEE Catalog Number: CFP15528-POD ISBN (Print-On-Demand): 978-1-4673-9092-7 ISBN (Online): 978-1-4673-9091-0

### Additional Copies of This Publication Are Available From:

Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA Phone: (845) 758-0400

Fax: (845) 758-2633 E-mail: curran@proceedings.com

Web: www.proceedings.com



## **Table of Contents**

| Ora | al Session O1: Applications I                                                                                                                                                                          |           |
|-----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
| (   | Accelerated Cell Imaging and Classification on FPGAs for Quantitative-phase Asymmetric-detection Time-stretch Optical Microscopy  Junyi Xie, Xinyu Niu, Andy K. S. Lau, Kevin K. Tsia, Hayden K. H. So | 1         |
|     | FPGA Acceleration of Reference-Based Compression for Genomic Data  James Arram, Moritz Pflanzer, Thomas Kaplan, Wayne Luk                                                                              | g         |
|     | Leftmost Longest Regular Expression Matching in Reconfigurable Logic Kubilay Atasu                                                                                                                     | 17        |
|     | Bringing Programmability to the Data Plane: Packet Processing wit NoC-Enhanced FPGA Andrew Bitar, Mohamed S. Abdelfattah, Vaughn Betz                                                                  | h a<br>24 |
| Ora | al Session O2: High Level Synthesis: Debugging                                                                                                                                                         |           |
|     | An Adaptive Virtual Overlay for Fast Trigger Insertion for FPGA  Debug  Fatemeh Eslami and Steven J.E. Wilton                                                                                          | 32        |
|     | Using Round-Robin Tracepoints to Debug Multithreaded HLS Circuon FPGAs  Jeffrey Goeders and Steven J.E. Wilton                                                                                         | its<br>40 |
|     | Using Source-to-Source Compilation to Instrument Circuits for Debwith High Level Synthesis Joshua S. Monson and Brad Hutchings                                                                         | ug<br>48  |
| Ora | al Session O3: Architecture                                                                                                                                                                            |           |
|     | QuickDough: A Rapid FPGA Loop Accelerator Design Framework Using Soft CGRA Overlay                                                                                                                     | 56        |

|    | Cheng Liu, Ho-Cheung Ng, Hayden Kwok-Hay So                                                                                                                                          |                    |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|
|    | Energy Minimization in the Time-Space Continuum<br>Hyunseok Park, Shreel Vijayvargiya, Andre DeHon                                                                                   | 64                 |
|    | Automatic FPGA System and Interconnect Construction with Multicast and Customizable Topology Alex Rodionov, Jonathan Rose                                                            | 72                 |
|    | Improved Carry Chain Mapping for the VTR Flow Ana Petkovska, Grace Zgheib, David Novo, Muhsen Owaida, Alan Mishch and Paolo Ienne                                                    | <b>80</b><br>nenko |
| O  | ral Session O4:                                                                                                                                                                      |                    |
|    | HETRIS: Adaptive Floorplanning for Heterogeneous FPGAs<br>Kevin E. Murray and Vaughn Betz                                                                                            | 88                 |
|    | Analyzing the Divide between FPGA Academic and Commercial Results  Elias Vansteenkiste, Alireza Kaviani and Henri Fraisse                                                            | 96                 |
|    | OpenCL Library of Stream Memory Components Targeting FPGAs<br>Jasmina Vasiljevic, Ralph Wittig, Paul Schumacher, Jeff Fifield, Fernando<br>Martinez Vallina, Henry Styles, Paul Chow | 104                |
|    | Exploring Pipe Implementations using an OpenCL Framework for FPGAs Vincent Mirian, Paul Chow                                                                                         | 112                |
| Oı | ral Session O5: Applications II                                                                                                                                                      |                    |
|    | An Exact MCMC Accelerator Under Custom Precision Regimes<br>Shuanglong Liu, Grigorios Mingas, Christos-Savvas Bouganis                                                               | 120                |
|    | FPGA Implementation of Low-Power and High-PSNR DCT/IDCT Architecture based on Adaptive Recoding CORDIC Jianfeng Zhang, Paul Chow, Hengzhu Liu                                        | 128                |
|    | Braiding: a Scheme for Resolving Hazards in Kernel Adaptive Filte                                                                                                                    | ers<br>136         |
|    | Stephen Tridgell, Duncan J.M. Moss, Nicholas J. Fraser and Philip H.W. L                                                                                                             |                    |
| O  | ral Session O6: High Level Synthesis II                                                                                                                                              |                    |
|    |                                                                                                                                                                                      |                    |

Custom-Sized Caches in Application-Specific Memory Hierarchies 144
Felix Winterstein, Kermin Fleming, Hsin-Jung Yang, John Wickerson, George
Constantinides

| Resource and Memory Management Techniques for the High-Le<br>Synthesis of Software Threads into Parallel FPGA Hardware<br>Jongsok Choi, Jason Anderson, Stephen Brown                              | vel<br>152    |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
| Provably Correct Development of Reconfigurable Hardware Desvia Equational Reasoning Ian Graves, Adam Procter, William L. Harrison, Gerard Allwein                                                  | igns<br>160   |
| Poster Session P1:                                                                                                                                                                                 |               |
| Behavioral-Level IP Integration in High-Level Synthesis<br>Liwei Yang, Swathi Gurumani, Deming Chen, Kyle Rupnow                                                                                   | 172           |
| Optimized High-Level Synthesis of SMT Multi-Threaded Hardward Accelerators  Jens Huthmann, Andreas Koch                                                                                            | re<br>176     |
| Minimizing DSP Block Usage Through Multi-Pumping<br>Bajaj Ronak and Suhaib A. Fahmy                                                                                                                | 184           |
| An Adaptive Cross-Layer Fault Recovery Solution for Reconfigue SoCs  Jifang Jin, Jian Yan, Xuegong Zhou and Lingli Wang                                                                            | ırable<br>188 |
| A Co-Design Approach for Accelerated SQL Query Processing v<br>FPGA-based Data Filtering<br>Andreas Becher, Daniel Ziener, Klaus Meyer-Wegener, and Jurgen Teich                                   | 192           |
| A Self-aware Data Compression System on FPGA in Hadoop<br>Yubin Li, Yuliang Sun, Guohao Dai, Yuzhi Wang, Jiacai Ni, Yu Wang,<br>Guoliang Li, Huazhong Yang                                         | 196           |
| Poster Session P2:                                                                                                                                                                                 |               |
| An FPGA-based Real-time Simultaneous Localization and Mapp<br>System<br>Mengyuan Gu, Kaiyuan Guo, Wenqiang Wang, Yu Wang, Huazhong Ya                                                              | 200           |
| Hardware Design of a Fast, Parallel Random Tree Path Planner Size Xiao, Adam Postula and Neil Bergmannı                                                                                            | 204           |
| Lower Precision for Higher Accuracy: Precision and Resolution Exploration for Shallow Water Equations  James Stanley Targett, Xinyu Niu, Francis Russell, Wayne Luk, Stephen Jeffress, Peter Duben | 208           |
| Comparison of Thread Signatures for Error Detection in Hybrid I cores  Sebastian Meisner and Marco Platzner                                                                                        | Multi-<br>212 |

|           | Advanced Bayer Demosaicing on FPGAs Donald Bailey, Sharmil Randhawa, Jim S. Jimmy Li                                                                                           | 216                |  |  |
|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|--|--|
|           | JIT Trace-based Verification for High-Level Synthesis Liwei Yang, Magzhan Ikram, Swathi Gurumani, Suhaib Fahmy, Deming C Kyle Rupnow                                           | <b>228</b><br>hen, |  |  |
| PhD Forum |                                                                                                                                                                                |                    |  |  |
|           | Cryptographic Techniques in Redundant Number Systems Jason Motha, Andrew Bainbridge-Smith, Steve Weddell                                                                       | 232                |  |  |
|           | 2D Discrete Fourier Transform with Simultaneous Edge Artifact<br>Removal for Real-Time Applications<br>Faisal Mahmood, Mart Toots, Lars-Goran Ofverstedt and Ulf Skoglund      | 236                |  |  |
|           | FPGA based Acceleration of FDAS Module for Pulsar Search<br>Haomiao Wang, Oliver Sinnen                                                                                        | 240                |  |  |
|           | FPGA Implementation of a SIMD-Based Array Processor with Toru<br>Interconnect<br>Yuki Murakami                                                                                 | ıs<br>244          |  |  |
| D         | emonstration Session                                                                                                                                                           |                    |  |  |
|           | An Efficient Architecture for Zero Overhead Data En-/Decryption using Reconfigurable Cryptographic Engine Bony H.K. Chen, Paul Y.S. Cheung, Peter Y.K. Cheung, Yu-Kwong Kwok   | 248                |  |  |
|           | Smart Camera for Trax Playing Robot<br>Donald G Bailey                                                                                                                         | 252                |  |  |
| D         | esign Competition                                                                                                                                                              |                    |  |  |
|           | Development of a TRAX Artificial Intelligence Algorithm using Patl and Edge  Ryo Okuda, Tomohiro Tanaka, Keisuke Yamamoto, Takumu Yahagi and Kazuya Tanigawa                   | h<br>256           |  |  |
|           | FPGA Trax Solver based on a Neural Network Design Takumi Fujimori, Tomoya Akabe, Yoshizumi Ito, Kouta Akagi, Shinya Furukawa, Hiroki Shinba, Aoi Tanibata, and Minoru Watanabe | 260                |  |  |
|           | An Architecture-Algorithm Co-Design of Artificial Intelligence for T<br>Player                                                                                                 | Γrax<br>264        |  |  |
|           | Qing Lu, Chiu-Wing Sham and Francis C. M. Lau  An Implementation of Trax Player using Programmable SoC                                                                         | 268                |  |  |
|           |                                                                                                                                                                                |                    |  |  |

## Akira Kojima

## Trax Solver on Zynq with Deep Q-Network

272

Naru Sugimoto, Takuji Mitsuishi, Takahiro Kaneda, Chiharu Tsuruta, Ryotaro Sakai, Hideki Shimura and Hideharu Amano