Numerical applications require large amounts of computing
power. Although shared memory multiprocessors provide a cost-
effective platform for parallel execution of numerical programs,
parallel processing has not delivered the expected performance
on these machines. There are two crucial steps in parallel
execution of numerical applications: (1)effective parallelization
of an application and (2)efficient execution of the parallel
program on a multiprocessor. This thesis addresses the second
step within the scope of automatically parallelized FORTRAN programs.
In this thesis, the mismatch between the needs of parallelized
FORTRAN programs and the support for parallel execution in shared
memory multiprocessors is identified as a cause of poor performance.
The thesis addresses this problem from two angles, architectural and
software support for parallel execution and compiler transformation
to enhance program characteristics. Architectural features and
synchronization and scheduling algorithms are studied to increase
the efficiency of support for parallel execution. It is shown that
architectures supporting atomic fetch&add primitives and
synchronization busses can execute programs more effectively.
New algorithms for lock access and parallel task scheduling are
proposed.
The thesis also explores transformations which can modify
parallel program characteristics to increase the parallel
execution efficiency of less sophisticated architectures. It
is shown that by using blocking transformations on nested parallel
loops, program characteristics can be modified to decrease the need
for scheduling and synchronization operations. This results in an
increase in the efficiency of parallel execution, especially for
multiprocessors with simplistic support for interprocessor
synchronization.