Remove excessive H2D and D2H coordinates copies when update is offloaded
The H2D copies are only needed:
1. When update is not ofloaded.
2. At the search steps, after device buffers were reinitialized.
The D2H copies are only needed:
1. On the search steps, since the device buffers are reinitialized.
2. If there are CPU consumers, e.g. CPU bondeds.
3. When the energy is computed.
4. When coordinates are needed for output.
There are two special cases, when coordinates are needed on host,
that dealt with separately:
1. When the PME it tuned.
2. When center of mass motion is removed.
The locality of copied atoms when update is offloaded is changed
from All to Local in preparation for multi-GPU case. The blocking sync
on H2D copy event is moved from UpdateConstraints to
StatePropagatorDataGpu.
Change-Id: I971a6273b39fa7da07600312c085ce343b5d25ee