Clarify force buffer setup code in do_force
Refactored code and made conditionals non-nested to improve
the ease of understanding when is a common or separate buffer used for
the forces when direct virial contribution is computed.
Also add subcounter for force buffer clearing which also helps annotate
code that should be conditional on whether any of these buffers are used
to accumulate or only to copy into (e.g. with everything offloaded to a
GPU).
Refs #2802
Change-Id: I3fa5a3e4e4adf5cfe0eb417f0c1c3d0ed4a96769