## 2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

(CGO 2011)

Chamonix, France 2 – 6 April 2011



**IEEE Catalog Number: ISBN:** 

CFP11CGO-PRT 978-1-61284-356-8

## **Table of Contents**

| Message from the General Chair                                                                                                                                             |        |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|
| Message from the Program Co-chairs                                                                                                                                         |        |
| Organizing Committee                                                                                                                                                       |        |
| Program Committee                                                                                                                                                          | x      |
| Additional Reviewers                                                                                                                                                       |        |
| Sponsors                                                                                                                                                                   | xii    |
| WORKSHOPS                                                                                                                                                                  |        |
| Workshop on Intermediate Representations                                                                                                                                   | xiii   |
| Workshop "analyse to compile, compiler to analyse"                                                                                                                         | xiv    |
| First International Workshop on Polyhedral Compilation Techniques                                                                                                          | XV     |
| Fifth Workshop on Statistical and Machine learning approaches to ARchitecture and compilaTion                                                                              |        |
| Third International Workshop on GCC Research Opportunities                                                                                                                 | xvii   |
| Third Workshop on Infrastructures for Software/Hardware co-design                                                                                                          | xviii  |
| Workshop on Optimizations for DSP and Embedded Systems                                                                                                                     | xix    |
| TUTORIALS                                                                                                                                                                  |        |
| Array Building Blocks: A Dynamic Compiler for Data-parallel Heterogeneous Systems                                                                                          | xx     |
| Building Dynamic Instrumentation Tools with DynamoRIO  Derek Bruening (Google), Qin Zhao (MIT)                                                                             | xxi    |
| Essential Abstractions in GCC                                                                                                                                              | xxii   |
| GPU Programming Models, Optimizations and Tuning                                                                                                                           | xxiii  |
| Detailed Pin!  Tevi Devor (Intel), Robert Cohn (Intel)                                                                                                                     | . xxiv |
| PIPS: An Interprocedural Extensible Source-to-Source Compiler Infrastructure for                                                                                           |        |
| Code/Application Transformations and Instrumentations                                                                                                                      | XXV    |
| AlphaZ and the Polyhedral Equational Model                                                                                                                                 | . xxvi |
| S. Rajopadhye (Colorado State University)                                                                                                                                  |        |
| Program Optimization through Loop Vectorization                                                                                                                            | xxvii  |
| Reconciling Compilers and Timing Analysis for Safety-Critical Real-Time Systems - the WCET-aware C Compiler WCC                                                            | xxviii |
| Inside X10: Implementing a High-level Language on Distributed and Heterogeneous Platforms  Olivier Tardieu (IBM Research), David Cunningham, Igor Peshansky (IBM Research) | xxix   |
| KEYNOTES                                                                                                                                                                   |        |
| The Language, Optimizer, and Tools Mess                                                                                                                                    | xxx    |
| Formally Verifying a Compiler: Why? How? How Far?                                                                                                                          | xxxi   |

## Low Level Code Optimization

| MAO - an Extensible Micro-Architectural Optimizer                                                                                                                                                                                                                       | 1   |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Phase-based Tuning for Better Utilization of Performance-Asymmetric Multicore Processors                                                                                                                                                                                | 11  |
| Tyler Sondag (Iowa State University), Hridesh Rajan (Iowa State University)                                                                                                                                                                                             | 11  |
| Dynamic Register Promotion of Stack Variables                                                                                                                                                                                                                           | 21  |
| Jianjun Li (Institute of Computing Technology, Chinese Academy of Sciences), Chenggang Wu (Institute of Computing Technology, Chinese Academy of Sciences), Wei-Chung Hsu (National Chiao Tung University)                                                              |     |
| Link-Time Optimization for Power Efficiency in a Tagless Instruction Cache                                                                                                                                                                                              | 32  |
| Speculation and Transactional Memory                                                                                                                                                                                                                                    |     |
| The Runtime Abort Graph and its Application to Software Transactional Memory Optimization                                                                                                                                                                               | 42  |
| LAR-CC: Large Atomic Regions with Conditional Commits                                                                                                                                                                                                                   | 54  |
| Runtime Automatic Speculative Parallelization                                                                                                                                                                                                                           | 64  |
| Ben Hertzberg (Stanford University), Kunle Olukotun (Stanford University)                                                                                                                                                                                               |     |
| Dynamically Accelerating Client-side Web Applications through Decoupled Execution                                                                                                                                                                                       | 74  |
| Language Support for Optimization                                                                                                                                                                                                                                       |     |
| Language and Compiler Support for Auto-Tuning Variable-Accuracy Algorithms                                                                                                                                                                                              | 85  |
| Automated Programmable Control and Parameterization of Compiler Optimizations                                                                                                                                                                                           | 97  |
| Extendable Pattern-Oriented Optimization Directives                                                                                                                                                                                                                     | 107 |
| Vectorization and Parallelization                                                                                                                                                                                                                                       |     |
| Predictive Modeling in a Polyhedral Optimization Space  Eunjung Park (University of Delaware), Louis-Noel Pouchet (The Ohio State University), John Cavazos (University of Delaware), Albert Cohen (INRIA Saclay), P. Sadayappan (The Ohio State University)            | 119 |
| Automatic parallelization of fine-grained meta-functions on a Chip Multiprocessor                                                                                                                                                                                       | 130 |
| Whole-Function Vectorization                                                                                                                                                                                                                                            | 141 |
| Vapor SIMD: Auto-Vectorize Once, Run Everywhere                                                                                                                                                                                                                         | 151 |
| Data Locality                                                                                                                                                                                                                                                           |     |
| On-Chip Cache Hierarchy Aware Tile Scheduling for Multicore Machines  Jun Liu (The Pennsylvania State University), Yuanrui Zhang (The Pennsylvania State University), Wei Ding (The Pennsylvania State University), Mahmut Kandemir (The Pennsylvania State University) | 161 |
| Pennsylvania State University), Manmut Kanaemir (The Pennsylvania State University)  Pinpointing Data Locality Problems Using Data-centric Analysis                                                                                                                     | 171 |

| Automated Locality Optimization based on the Reuse Distance of String Operations                                                                                                | 181 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Neighborhood-Aware Data Locality Optimization for NoC-Based Multicores                                                                                                          | 191 |
| Program Safety                                                                                                                                                                  |     |
| AccuLock: Accurate and Efficient Detection of Data Races                                                                                                                        | 201 |
|                                                                                                                                                                                 |     |
| Practical Memory Checking with Dr. Memory                                                                                                                                       | 213 |
| Derek Bruening (Google), Qin Zhao (Massachusetts Institute of Technology)                                                                                                       |     |
| Dynamic Compilation                                                                                                                                                             |     |
| Intel's Array Building Blocks: A Retargetable, Dynamic Compiler and Embedded Language                                                                                           | 224 |
| Chris J. Newburn (Intel Corporation), Byoungro So (Intel Corporation), Zhenying Liu (Intel Corporation), Michael McCool                                                         |     |
| (Intel Corporation), Anwar Ghuloum (Intel Corporation), Stefanus Du Toit (Intel Corporation), Zhi Gang Wang (Intel                                                              |     |
| Corporation), Zhao Hui Du (Intel Corporation), Yongjian Chen (Intel Corporation), Peng Guo (Intel Corporation), Zhanglin Liu (Intel Corporation), Dan Zhang (Intel Corporation) | ı   |
| A HW/SW Co-designed Multi-Core Virtual Machine for Energy-Efficient General Purpose Computing                                                                                   | 236 |
| A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler                                                                                                        | 246 |
| Hiroshi Inoue (IBM Research), Hiroshige Hayashizaki (IBM Research), Peng Wu (IBM Research), Toshio Nakatani (IBM Research)                                                      |     |
| Using Machines to Learn Method-Specific Compilation Strategies                                                                                                                  | 257 |
| Ricardo Nabinger Sanchez (University of Alberta,), Jose Nelson Amaral (University of Alberta,), Duane Szafron (University                                                       |     |
| of Alberta,), Marius Pirvu (IBM Toronto Software Laboratory), Mark Stoodley (IBM Toronto Software Laboratory)                                                                   |     |
| Program Analysis                                                                                                                                                                |     |
| Prioritizing Constraint Evaluation for Efficient Points-to Analysis                                                                                                             | 267 |
| Rupesh Nasre (Indian Institute of Science), R Govindarajan (Indian Institute of Science)                                                                                        |     |
| Highly Scalable Distributed Dataflow Analysis                                                                                                                                   | 277 |
| Joseph L. Greathouse (University of Michigan, Ann Arbor), Chelsea LeBlanc (University of Michigan, Ann Arbor), Todd                                                             |     |
| Austin (University of Michigan, Ann Arbor), Valeria Bertacco (University of Michigan, Ann Arbor)                                                                                |     |
| Flow-Sensitive Pointer Analysis for Millions of Lines of Code                                                                                                                   | 289 |
| tion bensitive I officer rimary signor minimum of Lines of Code                                                                                                                 |     |