HyperLink   Improved Superblock Optimization in GCC.
   
Publication Year:
  2006
Authors
  Robert E. Kidd, Wen-mei Hwu
   
Published:
  Proceedings of the GCC Developer's Summit, pp. 85-96, June 2006
   
Abstract:

Superblock scheduling is a common technique to increase the level of instruction level parallelism(ILP) in generated code. Compared to a
basic block, the Superblock gives an optimizer or scheduler a longer range over which instructions can be moved. The bookkeeping necessary
to execute that move is less than would be necessary inside an arbitrary trace region. Additionally,the process of forming Superblocks generates more instructions that are eligible for movement. These factors combine to produce
a significant increase in the ILP in a section of code.
By identifying the key feature of Superblock formation that allows this increase in ILP, we can generalize the concept to describe a class of
similar optimizations. We refer to techniques in this class as structural techniques. Combining several optimizations in this class with aggressive
classical optimization has been shown in the OpenIMPACT compiler to be particularly useful in developing ILP when compiling for the Itanium processor.
As a motivation for our work, we present an investigation into the value of structural compilation in the OpenIMPACT compiler. In this domain, structural techniques have been credited with a 10% to 13% increase in code performance over a compiler that implements only classical optimizations.

As a first step toward developing structural compilation techniques in GCC, we implemented Superblock formation at the Tree-SSA level. By performing structural transformations early, we give the compilers high level optimizers
an opportunity to specialize the transformed program, thereby cultivating higher levels of ILP. The early results of this modification are mixed, with some benchmarks improving and others slowing. In this paper, we present
details on our implementation and study the effects of this structural transformation on later optimizations. Through this, we hope to motivate
future work to implement and improve optimizations that can take advantage of the transformed control flow.