How to program a parallel machine has always been a major research problem.
Many tools, languages and libraries are developed in order to make
parallel programming more accessible for most users. However, no matter
what approach is taken to program a parallel machine, there is always a
trade-off between productivity, performance and portability. It is very hard
to develop a system that only requires short and concise code to achieve
close-to-optimal performance on a wide range of parallel machines.
In this thesis, a novel programming framework is developed to achieve
a good combination of productivity, performance and portability. The programming
framework is designed based on computation patterns that contain
parallel information. The programming framework can efficiently map these
computation patterns onto a parallel machine. The programming framework
also utilizes the C++ templates to generate optimized code for different
compositions of computation patterns. It uses a novel way to implement
the computation patterns that allow automatic high-level optimization at
compile time. Through the benchmarks, it shows that the programming
framework can effectively express the computation kernels in few lines of
code and achieve the performance of their optimized C code on multi-core
CPUs.