HyperLink   Code Scheduling for VLIW/Superscalar Processors with Limited Register Files.
Publication Year:
  Tokuzo Kiyohara, John C. Gyllenhaal
  Proceedings of the 25th International Symposium on Microarchitecture, pp. 197-201, Dec. 1992

Moderate size register files can limit the performance of loop unrolling on multiple issue processors. With current scheduling heuristics, a breadth-first scheduling of iterations occurs, increasing register pressure and generating excessive spill code.
A heuristic is proposed that causes a more depth-first scheduling of unrolled iterations. This heuristic reduces the overlapping of the unrolled iterations and as a result, reduces register pressure. The experimental evaluation shows increased performance on processors with 32 or 64 registers. In addition, the performance of dependency removing optimizations is stabilized, so that applying additional optimizations is more likely to increase performance.