Numerical applications frequently contain nested loop
structures that process large arrays of data. The
execution of these loop structures often produces memory
preference patterns that poorly utilize data caches.
Limited associativity and cache capacity result in cache
conflict misses. Also, non-unit stride access patterns
can cause low utilization of cache lines. Data copying
has been proposed and investigated in order to reduce the
cache conflict misses [1][2], but this technique has a high
execution overhead since it does the copy operations entirely
in software.
I propose a combined hardware and software technique
called data relocation and prefetching which eliminates much
of the overhead of data copying through the use of special
hardware. Furthermore, by relocating the data while
performing software prefetching, the overhead of copying the
data can be reduced further. Experimental results for data
relocation and prefetching show a large improvement in cache
performance.