We describe an innovative, low-power, high-performance, programmable
signal processor (DSP) for digital communications. The architecture of
this processor is characterized by its explicit design for low-power
implementations, its innovative ability to jointly exploit
instruction-level parallelism and data-level parallelism to achieve high
performance, its suitability as a target for an optimizing high-level
language compiler, and its explicit replacement of hardware resources by
compile-time practices. We describe the methodology used in the
development of the processor, highlighting the techniques deployed to
enable application/architecture/compiler/implementation co-development,
and the optimization approach and metric used for power-performance
evaluation and tradeoff analysis. We summarize the salient features of
the architecture, provide a brief description of the hardware
organization, and discuss the compiler techniques used to exercise these
features. We also summarize the simulation environment and associated
software development tools. Coding examples from two representative
kernels in the digital communications domain are also provided. The
resulting methodology, architecture, and compiler represent an advance
of the state of the art in the area of low-power, domain-specific
microprocessors.