Improvements in main memory speeds have not kept pace with increasing
processor clock frequency and improved exploitation of instruction-level
parallelism. Consequently, the gap between processor and main memory
performance is expected to grow, increasing the number of execution
cycles spent waiting for memory accesses to complete. One solution to
this growing problem is to reduce the number of cache misses by
increasing the effectiveness of the cache hierarchy. In this paper we
present a technique for dynamic analysis of program data access
behavior, which is then used to proactively guide the placement of data
within the cache hierarchy in a location-sensitive manner. We introduce
the concept of a macroblock, which allows us to feasibly characterize
the memory locations accessed by a program, and a Memory Address Table,
which performs the dynamic reference analysis. Our technique is fully
compatible with existing Instruction Set Architectures. Results from
detailed simulations of several integer programs show significant
speedups.