HyperLink   Effective Modulo Scheduling for Clustered Architectures.
Paper of IMPACT - Cited Greater Than 100 Times
   
Publication Year:
  1998
Authors
  Erik M. Nystrom, Alexandre E. Eichenberger
   
Published:
  Proceedings of the 31th International Symposium on Microarchitecture, Dec, 1998
   
Abstract:

Clustering is one solution to the demand for wide-issue machines and fast clock cycles because it allows for smaller, less ported register files and simpler bypass logic while remaining scaleable. Much of the previous work on scheduling for clustered architectures has focused on acyclic code. While minimizing schedule length of acyclic code is paramount, the primary objective when scheduling cyclic code is to maximize the throughput or steady state performance. This paper investigates a pre-modulo scheduling pass that performs cluster assignment in a way that minimizes performance degradation do to explicit communication required as the loops are split over clusters. The cluster assignment algorithm annotates and adjusts the graph for use by the scheduler so that any traditional modulo scheduling algorithm, having no knowledge of clustering, can produce a valid and efficient schedule for a clustered machine.

The algorithm was tested on various machine configurations including ones with general function units and fully specialized function units. The results show that, given a reasonable number of buses, the assignment algorithm allows for a modulo schedule on a clustered machine equal to that of a unified machine for 98% and 95% of loops on general and fully specialized machines, respectively, for both two and four cluster configurations.