Unify NB atoms and staging data structures in OpenCL, CUDA and SYCL