Breadth-first search (BFS) has wide applications in electronic
design automation (EDA) as well as in other fields.
Researchers have tried to accelerate BFS on the GPU, but
the two published works are both asymptotically slower than
the fastest CPU implementation. In this paper, we present
a new GPU implementation of BFS that uses a hierarchical
queue management technique and a three-layer kernel arrangement
strategy. It guarantees the same computational
complexity as the fastest sequential version and can achieve
up to 10 times speedup.