HyperLink   Compiler Code Transformations for Superscalar-Based High-Performance Systems.
Publication Year:
  Scott A. Mahlke, William Y. Chen, John C. Gyllenhaal, Wen-mei Hwu, Pohua P. Chang, Tokuzo Kiyohara
  Proceedings of Supercomputing 1992, Minneapolis, Minnesota, pp. 808-817, Nov. 16-20, 1992

Exploiting parallelism at both the multiprocessor level and the instruction level is an effective means for supercomputers to achieve high-performance. The amount of instruction-level parallelism available to superscalar or VLIW node processors can be limited, however, with conventional compiler optimization techniques. In this paper, a set of compiler transformations designed to increase instruction-level parallelism is described. The effectiveness of these transformations is evaluated using 40 loop nests extracted from a range of supercomputer applications. This evaluation shows that increasing execution resources in superscalar/VLIW node processors yields little performance improvement unless loop unrolling and register renaming are applied. It also reveals that these two transformations are sufficient for DOALL loops. However, more advanced transformations are required to order for serial and DOACROSS loops to fully benefit from the increased execution resources. The results show that the six additional transformations studied satisfy most of this need.