Add device-to-device copy function wrapper with tests in CUDA