Remove unnecessary CUDA stream synchronization calls