Decouple GPU force buffer management from buffer ops in NBNXM
When GPU-side buffer operations are used, the total forces on the
device are accumulated in NBNXM module in the local GPU buffer.
By decoupling the CPU and GPU buffer operations and making the
force buffer into an argument for the reduction function, this
commit allows to take the responsibility of the GPU forces
management from the NBNXM module to the third-party instance.
This commit is refactoring of the code in preparation for the
introduction of the GPU-side PropagatorStateData object.
TODO: Use DeviceBuffer when passing the PME GPU forces buffer.
Refs. #2816
Change-Id: I2a1f9d12fad3fb5b2ce37ca3ed3d0cb91777c468