F buffer operations in CUDA
This patch performs GPU buffer ops for force buffers.
Enable with GMX_USE_GPU_BUFFER_OPS env variable.
Currently, the H2D transfer of the force buffer is switched on with
haveSpecialForces || haveCpuBondedWork || haveCpuPmeWork,
where haveCpuPmeWork is true even when useGpuPme == true
until on-GPU PME-nonbonded reduction is added in follow-up.
TODO: enable PME reduction in GPU buffer ops and remove associated H2D
transfer
Implements part of #2817
Change-Id: Ice984425301d24bac1340e883698244489cd686e