Make stepWorkload.useGpuXBufferOps flag consistent
On search steps we do not use x buffer ops, so the workload flag should
correctly reflect that.
Also slightly refactored a conditional block to clarify the scope of
workload flags.
Note that as a side-effect of this change, coordinate H2D copy will be
delayed from the beginning of do_force() to just before update on search
steps when there are no force tasks that require it (i.e. without PME).
While this is not ideal for performance, the code is easier to reason
about.
Refs #3915 #3913 #4268