With advances in VLSI technology, microprocessor designers can provide
more microarchitectural parallelism to increase performance. We have
identified four major forms of such parallelism: multiple
microoperations issued per cycle, multiple result distribution buses,
multiple execution units, and pipelined execution units. The experiments
reported in this paper address two important issues: The effects of
these forms and the appropriate balance among them. A central
microarchitecture is identified as the comparison basis. We separately
vary each form of the microarchitectural parallelism in the central
to measure their individual effects on performance. In addition, we
vary two forms of the microarchitectural parallelism in the central
to derive an appropriate balance between them. To make fair
comparisons, our compiler generates different code sequences optimized
for different microarchitectural configurations. For each given set of
technology constraints, these experiments can be used to derive a
cost-effective microarchitecture to execute each given set of workload
programs at high speed.