The traditional method of extracting performance from programs is
based on scaling processor resources to execute multiple independent
instructions per cycle.
However, current state-of-the-art
compilers cannot expose the level of instruction-level parallelism
necessary to overcome the
diminishing performance returns of high-issue processors. Ultimately
performance becomes limited by the dependences of programs, the
fundamental dataflow limitation, and not the machine resources.
By eliminating dynamic computation redundancy for designated regions
of a program, the dataflow limit can be surpassed for sequences
of operations that are otherwise redundantly executed. Effective
exploitation of computation result locality requires coordinating compiler
and hardware techniques in an integrated framework.
Four key technologies are presented in a coordinated fashion
to eliminate dynamic redundancy from program execution.
The Reusable Computation Region Framework, the Compiler-directed
Computation Reuse Approach (CCR), and the Dynamic Computation Management
System (DCMS) represent innovative methods of dynamically directing the
microarchitecture execution engine of processors to make use of
computation redundancy hardware to improve program
performance. The compiler-based Value Optimization Framework
illustrates new compiler techniques for synthesizing code based on
data distribution. Systematically coordinating these compiler
techniques and hardware technologies can eliminate significant amounts
of the dynamic computation redundancy in program execution. These
techniques are new methods of improving modern processor utilization
and performance by exploiting readily available program value locality
characteristics.