In this paper, we propose a parallel design of Viterbi decoder for
Software-Defined Radio (SDR). Our method implements a divide-and-conquer
approach by tiling decoding sequences, performing independent
speculated Viterbi decoding, and merging partial candidate paths into
the final path. For each independent Viterbi decoding, the best path is
selected by calculating Hamming distances trellis-by-trellis in
parallel. Our method shows up to 14.6x speedup on an NVIDIA 8800 GTX
over a sequential C implementation on a 2.4GHz Intel Core 2 CPU. Also,
compared with the existing GPU-based implementation, our method
outperforms up to 2.5x.