Move initiation of local CPU force H2D transfer to producer
For GPU DD cases that include CPU force calculations, a host to device
transfer of local force data is required before the GPU halo
exchange. The initiation of the transfer was previously immediately
before the consumer (GPU halo exchange). This change moves the
initiation to immediately after the last possible producer (the
special force calculation, noting that the CPU force contributions can
also come from preceeding bonded or PME calculations).
Addresses #3082