This paper examines two alternative approaches to supporting
code scheduling for multiple-instruction-issue processors.
One is to provide a set of non-trapping instructions so that the
compiler can perform aggressive static code scheduling. The
application of this approach to existing commercial archi-
tectures typically requires extending the instruction set. The
other approach is to support out-of-order execution in the
microarchitecture so that the hardware can perform aggressive
dynamic code scheduling. This approach usually does not
require modifying the instruction set but requires complex
hardware support. In this paper, we analyze the performance of the two
alternative approaches using a set of important non-numerical C
benchmark programs. A distinguishing feature of the experiment
is that the code for the dynamic approach has been optimized and
scheduled as much as allowed by the architecture. The hardware
is only responsible for the additional reordering that cannot be
performed by the compiler. The overall result is that the
dynamic and static approaches are comparable in performance.
When applied to a four- instruction-issue processor, both
methods achieve more than two times speedup over a high
performance single-instruction-issue processor. However, the
performance of each scheme varies among the benchmark programs.
To explain this variation, we have identified the conditions
in these programs that make one approach perform better than
the other.