Motivation: Rapid advances in next-generation sequencing (NGS)
technology have led to exponential increase in the amount of genomic
information. However, NGS reads contain far more errors than data
from traditional sequencing methods, and downstream genomic analysis
results can be improved by correcting the errors. Unfortunately,
all the previous error correction methods required a large amount of
memory, making it unsuitable to process reads from large genomes
with commodity computers.
Results: We present a novel algorithm that produces accurate correction
results with much less memory compared with previous solutions.
The algorithm, named BLoom-filter-based Error correction Solution for
high-throughput Sequencing reads (BLESS), uses a single minimumsized
Bloom filter, and is also able to tolerate a higher false-positive
rate, thus allowing us to correct errors with a 40 memory usage reduction
on average compared with previous methods. Meanwhile, BLESS
can extend reads like DNA assemblers to correct errors at the end of
reads. Evaluations using real and simulated reads showed that BLESS
could generate more accurate results than existing solutions. After errors
were corrected using BLESS, 69% of initially unaligned reads could be
aligned correctly. Additionally, de novo assembly results became 50%
longer with 66% fewer assembly errors.