## 2010 21st IEEE International **Conference on Application-specific Systems Architectures and Processors** (ASAP 2010) Rennes, France 7-9 July 2010 **IEEE Catalog Number: CFP10063-PRT ISBN**: 978-1-4244-6966-6 ## **Table of Contents** ASAP 2010 — 21st IEEE International Conference on Application-specific Systems, Architectures and Processors | Keynotes | | |------------------------------------------------------------------------------------------------------------------------------------------------------------------|----| | Convergence of Design and Fabrication Technologies, a Key Enabler for HW-SW Integration | 3 | | Ahmed A. Jerraya | | | The Light at the End of the CMOS Tunnel | 4 | | Session 1: Mapping for Multi-Core Architectures | | | Dynamic Code Mapping for Limited Local Memory Systems | 13 | | Design of an Automatic Target Recognition Algorithm on the IBM Cell Broadband Engine | 21 | | Weijia Che and Karam S. Chatha | | | Highly Efficient Mapping of the Smith-Waterman Algorithm on CUDA-compatible GPUs Keisuke Dohi, Khaled Benkrid, Cheng Ling, Tsuyoshi Hamada, and Yuichiro Shibata | 29 | | Session 2: Design Space Exploration | | | ImpEDE: A Multidimensional Design-Space Exploration Framework for Biomedical-Implant Processors | 39 | | Dhara Dave, Christos Strydis, and Georgi N. Gaydadjiev | | | Design Space Exploration of Parametric Pipelined Designs | 47 | | Design Space Exploration for an Embedded Processor with Flexible Datapath Inter-<br>connect | 55 | | Tung Thanh Hoang, Ulf Jälmbrant, Erik der Hagopian, Kasyab P. Subramaniyan, Magnus<br>Själander, and Magnus Larsson-Edefors | | | Session 3: Systems-On-Chip and Networks-On-Chip | | | Using Shared Library Interposing for Transparent Application Acceleration in Systems with Heterogeneous Hardware Accelerators | 65 | | Tobias Beisel, Manuel Niekamp, and Christian Plessl | | | less Interconnects | _ 73 | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------| | Sujay Deb, Amlan Ganguly, Kevin Chang, Partha Pande, Benjamin Belzer, and Benjamin Heo | | | A Bayesian Network-Based Framework with Constraint Satisfaction Problem (CSP) Formulations for FPGA System Design | _ 81 | | Amelia W. Azman, Abbas Bigdeli, Yasir Mohd-Mustafah, Morteza Biglari-Abhariy, and Brian<br>C. Lovell | | | An Optimized NoC Architecture for Accelerating TSP Kernels in Breakpoint Median Problem | _ 89 | | Turbo Majumder, Souradip Sarkar, Partha Pande, and Ananth Kalyanaraman | | | Session 4: Formal Methods | | | A Formal Specification of Fault-Tolerance in Prospecting Asteroid Mission with Reactive Autonomic Systems Framework | _ 99 | | Heng Kuang, Olga Ormandjieva, Stan Klasa, and Jamal Bentahar | | | Comparing the Robustness of Fault-Tolerant Enhancements When Applied to Lookup Tables and Random Logic for Nano-Computing Yocheved Dotan, Orgad Chen, Gil Katz, and David J. Lilja | _ 107 | | Dependability Analysis of a Countermeasure against Fault Attacks by means of Laser Shots onto a SRAM-based FPGA | _ 115 | | Gaetan Canivet, Paolo Maistri, Régis Leveugle, Frédéric Valette, Jessy Clédière, and Jessy<br>Renaudin | | | Session 5: Design and Programming of Array Architectures | | | Modeling and Synthesis of Communication Subsystems for Loop Accelerator Pipelines _<br>Hritam Dutta, Frank Hannig, Moritz Schmid, and Joachim Keinert | _ 125 | | Design of Throughput-Optimized Arrays from Recurrence Abstractions | _ 133 | | A C++-embedded Domain-Specific Language for Programming the MORA Soft Processor Array | _ 141 | | Wim Vanderbauwhede, Martin Margala, Sai Rahul Chalamalasetti, and Sohan Purohit | | | Session 6: Application-Specific Processors | | | A Forwarding-sensitive Instruction Scheduling Approach to Reduce Register File Constraints in VLIW Architectures | _ 151 | | Guillermo Payá-Vayá, Javier Martín-Langerwerf, Holger Blume, and Peter Pirsch | | | Dual-Purpose Custom Instruction Identification Algorithm based on Particle Swarm Optimization | _ 159 | | Mehdi Kamal, Neda Kazemian Amiri, Arezoo Kamran, Seyyed Alireza Hoseini, Masoud Dehyadegari, and Masoud Noori | | | Combined Scheduling and Instruction Selection for Processors with Reconfigurable Cell Fabric | _ 167 | | Antoine Floch, Christophe Wolinski, and Krzysztof Kuchcinski | | | Completeness of Automatically Generated Instruction Selectors | 175 | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------| | Session 7: Computer Arithmetics and Cryptography | | | Implementation of Binary Edwards Curves for Very-Constrained Devices | . 185 | | Elliptic Curve Point Multiplication on GPUs | . 192 | | Newton-Raphson Algorithms for Floating-Point Division Using an FMA | 200 | | An FPGA-Specific Algorithm for Direct Generation of Multi-Variate Gaussian Random Numbers | 208 | | David B. Thomas and Wayne Luk | | | Automatic Generation of Polynomial-Based Hardware Architectures for Function Evaluation | 216 | | Florent de Dinechin, Mioara Joldes, and Bogdan Pasca | | | Session 8: Application-Specific Architectures | | | A Fully-Overlapped Multi-Mode QC-LDPC Decoder Architecture for Mobile WiMAX Applications | 225 | | Bo Xiang, Dan Bao, Shuangqu Huang, and Xiaoyang Zeng | | | High Parallel Variation Banyan Network Based Permutation Network for Reconfigurable LDPC Decoder | 233 | | Xiao Peng, Zhixiang Chen, Xiongxin Zhao, Fumiaki Maehara, and Satoshi Goto | | | A High Efficient Memory Architecture for H.264/AVC Motion Compensation<br>Chunshu Li, Kai Huang, Xiaolang Yan, Jiong Feng, De Ma, and De Ge | 239 | | FPGA-based Lossless Compressors of Floating-Point Data Streams to Enhance Memory Bandwidth | 246 | | Kazuya Katahira, Kentaro Sano, and Satoru Yamamoto | | | Session 9: Power-Aware Architectures | | | Power Dissipation Challenges in Multicore Floating-Point Units | 257 | | On Energy Efficiency of Reconfigurable Systems with Run-Time Partial Reconfiguration _ Shaoshan Liu, Richard Neil Pittman, Alessandro Forin, and Jean-Luc Gaudiot | 265 | | A GALS FFT Processor with Clock Modulation for Low-EMI Applications | . 273 | | Posters | | | Hardware-Assisted Middleware: Acceleration of Garbage Collection Operations Jie Tang, Shaoshan Liu, Zhimin Gu, Xiao-Feng Li, and Jean-Luc Gaudiot | . 281 | | Low-Resource Applications | _ 285 | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------| | An Efficient Computation Model for Coarse Grained Reconfigurable Architectures and its Applications to a Reconfigurable Computer | _ 289 | | Potential of using Block Floating Point Arithmetic in ASIP-Based GNSS-Receivers Emrah Tasdemir, Götz Kappen, and Tobias G. Noll | _ 293 | | Area Optimized H.264 Intra Prediction Architecture for 1080p HD Resolution | _ 297 | | Memoryless RNS-to-Binary Converters for the $\{2^{n+1}-1,2^n,2^n-1\}$ Moduli Set | _ 301 | | A Pipelined Camellia Architecture for Compact Hardware Implementation | _ 305 | | General-Purpose FPGA Platform for Efficient Encryption and Hashing | _ 309 | | A Compact FPGA-based Architecture for Elliptic Curve Cryptography over Prime Fields<br>Jo Vliegen, Nele Mentens, Jan Genoe, An Braeken, Serge Kubera, Serge Touhafi, and Ingrid<br>Verbauwhede | _ 313 | | Implementing Decimal Floating-Point Arithmetic through Binary: Some Suggestions<br>Nicolas Brisebarre, Nicolas Louvet, Érik Martin-Dorel, Jean-Michel Muller, Adrien Panhaleux,<br>and Adrien Ercegovac | _ 317 | | A New Approach in On-line Task Scheduling for Reconfigurable Computing Systems<br>Maisam Mansub Bassiri and Hadi Shahriar Shahhoseini | _ 321 | | Exploring Algorithmic Trading in Reconfigurable Hardware | _ 325 | | Optimizing DDR-SDRAM Communications at C-level for Automatically-Generated Hardware Accelerators – An Experience With the Altera C2H HLS Tool | _ 329 | | Deadlock-avoidance for Streaming Applications with Split-Join Structure: Two Case Studies | _ 333 | | Peng Li, Kunal Agrawal, Jeremy D. Buhler, Roger D. Chamberlain, and Joseph M. Lancaster | | | Customizing Controller Instruction Sets for Application-Specific Architectures | _ 337 | | Loop Transformations for Interface-based Hierarchies in SDF Graphs | _ 341 | | Code Generation for Hardware Accelerated AES | _ 345 | | Function Flattening for Lease-Based, Information-Leak-Free Systems | _ 349 | | Author Index | 353 | |--------------|-----| | | |