We propose Tangram, a general-purpose high-level language
that achieves high performance across architectures. In Tangram, a program
is written by synthesizing elemental pieces of code snippets, called
codelets. A codelet can have multiple semantic-preserving implementations
to enable automated algorithm and implementation selection. An
implementation of a codelet can be written with tunable knobs to allow
architecture-specific parameterization. The Tangram compiler produces
highly optimized code by choosing and composing architecture-friendly
codelets, and then tuning the knobs for the target architecture.
We demonstrate that Tangrams synthesized programs are comparable in
performance to existing well-optimized codes on both CPUs and GPUs.
The language is defined in a concise and maintainable way to improve
debuggability and to enable progressive improvement. This strategy allows
users to extend their applications and achieves higher performance
on existing architectures and new architectures.