This work presents an update to the triangle-counting portion of the
subgraph isomorphism static graph challenge. This work is motivated
by a desire to understand the impact of CUDA unified memory on
the triangle-counting problem. First, CUDA unified memory is used to
overlap reading large graph data from disk with graph data structures in
GPU memory. Second, we use CUDA unified memory hints to solve multi-GPU
performance scaling challenges present in our last submission. Finally,
we improve the single-GPU kernel performance from our past submission by
introducing a work-stealing dynamic algorithm GPU kernel with
persistent threads, which makes performance adaptive for large graphs
without requiring a graph analysis phase.