## 2016 International Conference on Compilers, Architectures, and Synthesis of Embedded Systems (CASES 2016)

Pittsburgh, Pennsylvania, USA 2 – 7 October 2016



**IEEE Catalog Number: ISBN:** 

CFP16CCS-POD 978-1-5090-3589-2

### Copyright © 2016, The Association for Computing Machinery, Inc. (ACM) **All Rights Reserved**

\*\*\*This publication is a representation of what appears in the IEEE Digital Libraries. Some format issues inherent in the e-media version may also appear in this print version.

CFP16CCS-POD IEEE Catalog Number: ISBN (Print-On-Demand): 978-1-5090-3589-2 ISBN (Online): 978-1-4503-4482-1

ISSN: 2381-1560

#### Additional Copies of This Publication Are Available From:

Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA

Phone: (845) 758-0400

Fax: (845) 758-2633 E-mail: curran@proceedings.com Web: www.proceedings.com



# 2016 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)

October 2-7, 2016 Pittsburgh Marriott City Center Pittsburgh, PA

| 1.1 - ILP-based Modulo Scheduling for High-level Synthesis1             |
|-------------------------------------------------------------------------|
| Julian Oppermann, Andreas Koch, Melanie Reuter-Oppermann, Oliver Sinnen |

- **1.2** Enabling OpenVX support in mW-scale parallel accelerators.....11 Giuseppe Tagliavini, Germain Haugou, Andrea Marongiu, Luca Benini
- **1.3** Handling Large Data Sets for High-Performance Embedded Applications in Heterogeneous Systems-on-Chip.....21 *Paolo Mantovani, Emilio G. Cota, Christian Pilato, Giuseppe Di Guglielmo, Luca Carloni*
- **1SS.1** Theoretical Foundations for Workload Modeling with Implications on Power Optimization.....N/A *Paul Boqdan*
- **1SS.2** Performance and Power Management in Wireless NoC-enabled Multicore Chips.....N/A *Partha Pande*
- **1SS.3** Thermal-driven Resource Allocation and Application Mapping for complex Many Core Systems.....N/A *Joerg Henkel*
- **2.1** Runtime Management of Adaptive MPSoCs for Graceful Degradation.....31 Stavros Tzilis, Ioannis Sourdis, Vasileios Vasilikos, Dimitrios Rodopoulos, Dimitrios Soudris
- **2.2** Towards the Design of Fault-Tolerant Mixed-Criticality Systems on Multicores.....41 *Luyuan Zeng, Pengcheng Huang, Lothar Thiele*
- **2.3** COMET: Communication-Optimised Multi-threaded Error-detection Technique.....51 *Konstantina Mitropoulou, Vasileios Porpodas, Timothy M. Jones*
- **2SS.1** Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints.....61 *Youhui Zhang, Yu Ji, Wenguang Chen, Yuan Xie*
- **2SS.2** Cambricon: An Instruction Set Architecture for Neural Networks.....N/A *Zidong Du*
- 2SS.3 RRAM based Learning Acceleration.....62

  Yu Wang, Lixue Xia, Ming Cheng, Tiangi Tang, Boxun Li, Huazhong Yang
- **3.1** A Real-Time Digital-Microfluidic Platform for Epigenetics.....64

  Mohamed Ibrahim, Craig Boswell, Krishnendu Chakrabarty, Kristin Scott, Miroslav Pajic
- **3.2** LOCUS: Low-Power Customizable Many-Core Architecture for Wearables.....74 Cheng Tan, Aditi Kulkarni, Vanchinathan Venkataramani, Manupa Karunaratne, Tulika Mitra, Li-Shiuan Peh
- **3.3** D-PUF: An Intrinsically Reconfigurable DRAM PUF for Device Authentication in Embedded Systems.....84 *Soubhagya Sutar, Arnab Raha, Vijay Raghunathan*

## 2016 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)

October 2-7, 2016 Pittsburgh Marriott City Center Pittsburgh, PA

**4.1** - Hybrid Network-on-Chip Architectures for Accelerating Deep Learning Kernels on Heterogeneous Manycore Platforms.....94

Wonje Choi, Karthi Duraisamy, Ryan Kim, Janardhan Rao Doppa, Partha Pande, Radu Marculescu, Diana Marculescu

- **4.2** CaffePresso: An Optimized Library for Deep Learning on Embedded Accelerator-based platforms.....104 *Gopalakrishna Hegde, Siddhartha, Nachiappan Ramasamy, Nachiket Kapre*
- **4.3** Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation.....114 *Michel Steuwer, Toomas Remmelg, Christophe Dubach*
- **5.1** Speculative disassembly of binary code.....124 *M. Ammar Ben Khadra, Dominik Stoffel, Wolfgang Kunz*
- **5.2** A jump-target identification method for multi-architecture static binary translation.....134 *Alessandro Di Federico, Giovanni Agosta*
- **5.3** On-the-Fly Load Data Value Tracing in Multicores.....144 *Mounika Ponugoti, Amrish Tewar, Aleksandar Milenkovic*
- **6.1** Redesigning a Tagless Access Buffer to Require Minimal ISA Changes.....154

  Carlos Sanchez, Peter Gavin, Daniel Moreau, Magnus Själander, David Whalley, Per Larsson-Edefors, Sally McKee
- **6.2** Thrifty-malloc: A HW/SW Codesign for the Dynamic Management of Hardware Transactional Memory in Embedded Multicore Systems.....164

Thomas Carle, Iris Bahar, Maurice Herlihy, Dimitra Papagiannopoulou, Tali Moreshet, Andrea Marongiu

**6.3** - FastCollect: Offloading Generational Garbage Collection to Integrated GPU.....174 *Abhinav, Rupesh Nasre* 

**Additional Paper** - Power and Thermal Management in Massive Multicore Chips: Theoretical Foundation meets Architectural Innovation and Resource Allocation.....184

Paul Boqdan, Partha Pratim Pande, Hussam Amrouch, Muhammad Shafique, Jorg Henkel