Moderate size register files can limit the performance
of loop unrolling on multiple issue processors. With
current scheduling heuristics, a breadth-first scheduling
of iterations occurs, increasing register pressure and
generating excessive spill code.
A heuristic is proposed that causes a more depth-first
scheduling of unrolled iterations. This heuristic reduces
the overlapping of the unrolled iterations and as a result,
reduces register pressure. The experimental evaluation
shows increased performance on processors with 32 or 64
registers. In addition, the performance of dependency
removing optimizations is stabilized, so that applying
additional optimizations is more likely to increase
performance.