make use of CUDA stream priorities