Use cudaLaunchKernel with CUDA 7.0 and later
CUDA 7.0 introduced the cudaLaunchKernel API call similar to the
CUDA driver API and avoids the chevron notation. This has the benefit
of a slight reduction in runtime API overhead (up to 2%) partly
because two runtime API calls that precede the kernel launch are skipped
(cudaSetupArgument and cudaConfigureCall).
For future dev-testing the GMX_DISABLE_CUDALAUNCH env. var. can be used to
force the chevron-notation kernel launch.
Change-Id: Id057fb01489814b99ae290de9e4ddd9f530a04be