Modulo Scheduling is a method for overlapping successive
iterations of a loop in order to find sufficient instruction-
level parallelism to fully utilize high-issue-rate processors.
The achieved throughput modulo scheduled loop depends on the
resource requirements, the dependence pattern, and the
register requirements of the computation in the loop body.
Traditionally, unrolling followed by acyclic scheduling of
the unrolled body has been an alternative to modulo scheduling.
However, there are benefits to unrolling even if the loop is
to be modulo scheduled. Unrolling can improve the throughput
by allowing a smaller non-integral effective initiation
interval to be achieved. After unrolling, optimizations can
be applied to the loop that reduce both the resource require-
ments and the height of the critical paths. Together,
unrolling and unrolling-based optimizations can enable the
completion of multiple iterations per cycle in some cases.
This paper describes the benefits of unrolling and a set of
optimizations for unrolled loops which have been implemented
in the IMPACT compiler. The performance benefits of un-
rolling for five of the SPEC CFP92 programs are reported.