Exploiting parallelism at both the multiprocessor level and
the instruction level is an effective means for supercomputers to
achieve high-performance. The amount of instruction-level
parallelism available to superscalar or VLIW node processors
can be limited, however, with conventional compiler optimization
techniques. In this paper, a set of compiler transformations
designed to increase instruction-level parallelism is described.
The effectiveness of these transformations is evaluated using
40 loop nests extracted from a range of supercomputer applications.
This evaluation shows that increasing execution resources in
superscalar/VLIW node processors yields little performance
improvement unless loop unrolling and register renaming are
applied. It also reveals that these two transformations are
sufficient for DOALL loops. However, more advanced
transformations are required to order for serial and DOACROSS
loops to fully benefit from the increased execution resources.
The results show that the six additional transformations studied
satisfy most of this need.