HyperLink   Architectural and Software Support for Executing Numerical Applications on High Performance Computers.
   
Publication Year:
  1993
Authors
  Sadun Anik
   
Published:
  PhD thesis, Department of Computer Science, University of Illinois, Urbana IL, CRHC-93-19, Sept. 1993
   
Abstract:

Numerical applications require large amounts of computing power. Although shared memory multiprocessors provide a cost- effective platform for parallel execution of numerical programs, parallel processing has not delivered the expected performance on these machines. There are two crucial steps in parallel execution of numerical applications: (1)effective parallelization of an application and (2)efficient execution of the parallel program on a multiprocessor. This thesis addresses the second step within the scope of automatically parallelized FORTRAN programs.

In this thesis, the mismatch between the needs of parallelized FORTRAN programs and the support for parallel execution in shared memory multiprocessors is identified as a cause of poor performance. The thesis addresses this problem from two angles, architectural and software support for parallel execution and compiler transformation to enhance program characteristics. Architectural features and synchronization and scheduling algorithms are studied to increase the efficiency of support for parallel execution. It is shown that architectures supporting atomic fetch&add primitives and synchronization busses can execute programs more effectively. New algorithms for lock access and parallel task scheduling are proposed.

The thesis also explores transformations which can modify parallel program characteristics to increase the parallel execution efficiency of less sophisticated architectures. It is shown that by using blocking transformations on nested parallel loops, program characteristics can be modified to decrease the need for scheduling and synchronization operations. This results in an increase in the efficiency of parallel execution, especially for multiprocessors with simplistic support for interprocessor synchronization.