Remove excessive H2D and D2H copies of forces when update is offloaded
Forces are copied H2D only on steps when force buffer ops is not
offloaded to the GPU (e.g. steps when virials are computed).
The D2H copy is issued:
- after force buffer ops if the update is not offloaded or if the
forces are needed on CPU by seprate PME rank force reduction or
for vsites force spreading.
- for the force output if update is offloaded and forces were not copied
yet.
Also added note regarding the need for the latter copy to ensure forces
are ready using the same mechanism as used for synchronizing update on
the availability of forces.
Change-Id: Ied74638d35e74a8970427df712501cd63e0aa0ab