IMPACT



Publication Year:
	1998

Authors
	Ben-Chung Cheng, Daniel A. Connors, Wen-mei Hwu

Published:
	Proceedings of the 31st International Symposium on Microarchitecture, December, 1998

Abstract:
	Two orthogonal hardware techniques, table-based address prediction and early address calculation, for reducing the latency of load instructions have been recently proposed. The key idea behind both of these techniques is to speculatively perform loads early in the processor pipeline using predicted values for the loads' addresses. These techniques have required either a large hardware table or complex register bypass logic to be implemented in order to accurately predict the important loads in the presence of a large number of less-important loads. This paper proposes a compiler-directed approach that allows a streamlined version of both of these techniques to be effectively used together. The compiler provides directives to indicate which prediction mechanism to use or, when appropriate, that a prediction should not be made. The hardware therefore can be focused on their target cases so that a smaller prediction table and simpler bypass logic suffice. Our results show that through straightforward compiler heuristics, we obtain an average speedup of 34% with a 256-entry direct-mapped address table and only one cached register. And with the help of address profiling, an extra 4% of speedup can be obtained.