Instruction schedulers for superscalar and VLIW processors
must expose sufficient instruction-level parallelism to the
hardware in order to achieve high performance. Traditional
compiler instruction scheduling techniques typically take
into account the constraints imposed by all execution
scenarios in the program. However, there are additional
opportunities to increase instruction-level parallelism
for the frequent execution scenarios at the expense of the
less frequent ones. Profile information identifies these
important execution scenarios in a program. In this paper,
two major categories of profile information are studied:
control-flow and memory-dependence. Profile-assisted code
scheduling techniques have been incorporated into the
IMPACT-I compiler. These techniques are acyclic global
scheduling and software pipelining. This paper describes
the scheduling algorithms, highlights the modifications
required to use profile information, and explains the
hardware and compiler support for dealing with hazards
that arise from aggressive use of profile information.
The effectiveness of these profile-based scheduling
techniques is evaluated for a range of superscalar
and VLIW processors.