Move initiation of local CPU force H2D transfer to producer
authorAlan Gray <alangray3@gmail.com>
Wed, 27 Jan 2021 14:51:06 +0000 (06:51 -0800)
committerAlan Gray <alangray3@gmail.com>
Thu, 28 Jan 2021 16:18:12 +0000 (16:18 +0000)
For GPU DD cases that include CPU force calculations, a host to device
transfer of local force data is required before the GPU halo
exchange. The initiation of the transfer was previously immediately
before the consumer (GPU halo exchange). This change moves the
initiation to immediately after the last possible producer (the
special force calculation, noting that the CPU force contributions can
also come from preceeding bonded or PME calculations).

Addresses #3082

src/gromacs/mdlib/sim_util.cpp

index c4c3f5e7ca987f2c787abe8c090a24371c61a54d..2494b02c6b529ad2b2034d70b3d8df18b6603490 100644 (file)
@@ -3,7 +3,7 @@
  *
  * Copyright (c) 1991-2000, University of Groningen, The Netherlands.
  * Copyright (c) 2001-2004, The GROMACS development team.
- * Copyright (c) 2013-2019,2020, by the GROMACS development team, led by
+ * Copyright (c) 2013-2019,2020,2021, by the GROMACS development team, led by
  * Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
  * and including many others, as listed in the AUTHORS file in the
  * top-level source directory and at http://www.gromacs.org.
@@ -1926,6 +1926,12 @@ void do_force(FILE*                               fplog,
                          ed,
                          stepWork.doNeighborSearch);
 
+    if (havePPDomainDecomposition(cr) && stepWork.computeForces && stepWork.useGpuFHalo
+        && domainWork.haveCpuLocalForceWork)
+    {
+        stateGpu->copyForcesToGpu(forceOutMtsLevel0.forceWithShiftForces().force(), AtomLocality::Local);
+    }
+
     GMX_ASSERT(!(nonbondedAtMtsLevel1 && stepWork.useGpuFBufferOps),
                "The schedule below does not allow for nonbonded MTS with GPU buffer ops");
     GMX_ASSERT(!(nonbondedAtMtsLevel1 && stepWork.useGpuFHalo),
@@ -2024,11 +2030,6 @@ void do_force(FILE*                               fplog,
 
             if (stepWork.useGpuFHalo)
             {
-                if (domainWork.haveCpuLocalForceWork)
-                {
-                    stateGpu->copyForcesToGpu(forceOutMtsLevel0.forceWithShiftForces().force(),
-                                              AtomLocality::Local);
-                }
                 communicateGpuHaloForces(*cr, domainWork.haveCpuLocalForceWork);
             }
             else