Recent studies on value locality reveal that many instructions are
frequently executed with a small variety of inputs. This
paper proposes an approach that integrates architecture and
compiler techniques to exploit value locality for large regions
of code. The approach
strives to eliminate redundant processor execution created by both
instruction-level input repetition and recurrence of input data within
high-level computations. In this approach, the compiler performs
analysis to identify code regions whose computation can be reused
during dynamic execution. The instruction set architecture provides a
simple interface for the compiler to communicate the scope of each
reuse region and its live-out register information to the hardware.
During run time, the execution results of these reusable computation
regions are recorded into hardware buffers for potential reuse. Each
reuse can eliminate the execution of a large number of dynamic
instructions. Furthermore, the actions needed to update the live-out
registers can be performed at a higher degree of parallelism than the
original code, breaking intrinsic dataflow dependence constraints.
Initial results show that the compiler analysis can indeed identify
large reuse regions. Overall, the approach can improve
the performance of a 6-issue microarchitecture by
an average of 30% for a collection of SPEC and integer
benchmarks.