Fix incorrect event dependency of GPU update
With separate PME ranks, staged communication and GPU update, the final
forces are produced on the CPU and re-uploaded to the GPU prior to
update. This upload was done on a "All" locality leading to an incorrect
dependency on an event corresponding to the GPU reduction.
This leads ot a logic error and incorrect synchronization, but not
incorrect results as the "All" locality copy is done on the update
stream, hence implicit dependency applies.
This change moves this special case copy to the same locality used in
other force host to device copies to eliminate the bug.
Fixes #4130