With the advent of several accurate and sophisticated statistical algorithms and pipelines for DNA sequence analysis, it is becoming increasingly possible to translate raw sequencing data into biologically meaningful information for further clinical analysis and processing. However, given the large volume of the data involved, even modestly complex algorithms would require a prohibitively long time to complete. Hence it is the need of the hour to explore non-conventional implementation platforms to accelerate genomics research. In this work, we present an FPGA-accelerated implementation of the Pair HMM forward algorithm, the performance bottleneck in the HaplotypeCaller, a critical function in the popular GATK variant calling tool. We introduce the PE ring structure which, thanks to the fine-grained parallelism allowed by the FPGA, can be built into various configurations striking a trade-off between instruction-level parallelism (ILP) and data parallelism. We investigate the resource utilization and performance of different configurations. Our solution can achieve a speed-up of up to 487× compared to the C++ baseline implementation on CPU and 1.56× compared to the best published hardware implementation.