It is increasingly necessary for the compiler to overlap successive loop iterations in order to
find sufficient instruction-level parallelism to effectively utilize the resources of high-performance
processors. Two competing methods have been developed for moving instructions across iteration
boundaries: unrolling followed by global acyclic scheduling and software pipelining. This
dissertation investigates modulo scheduling, a software pipelining technique. Much of the previous
work on modulo scheduling has targeted the relatively well-behaved loops in numeric
programs. This dissertation develops new techniques that allow modulo scheduling to be effectively
applied to control-intensive non-numeric programs. These techniques overcome the
restrictions imposed by problematic control flow and loop exits.
This dissertation also demonstrates that unrolling-based optimization prior to scheduling
improves the performance of modulo scheduled loops and is, in fact, necessary to allow modulo
scheduling to surpass the performance of acyclic scheduling for control-intensive general-purpose
programs. Modulo scheduling has the following advantages over the acyclic scheduling approach
for control-intensive general-purpose programs. First, modulo scheduling increases performance
by maintaining the overlap of loop iterations throughout the execution of the loop. Second,
modulo scheduling reduces register pressure by initiating iterations at a consistent rate that
is sustainable for the given resources and dependence structure. Third, with the appropriate
architectural support, modulo scheduling results in less code expansion because unrolling is
required only for optimization, but not to amortize the loss of overlap across the back edge.