Removed dependency on commrec of mdrun setup
authorMark Abraham <mark.j.abraham@gmail.com>
Tue, 21 May 2019 14:45:17 +0000 (16:45 +0200)
committerMark Abraham <mark.j.abraham@gmail.com>
Tue, 1 Oct 2019 05:24:02 +0000 (07:24 +0200)
Changes no functionality.

Setup is now parameterized directly on MPI_COMM_WORLD, which we will
want later for letting library-based callers pass in an
MPI_Communicator. This permits commrec to be initialized later, once
the threads have been spawned for the thread-MPI ranks.

The initialization of multi-simulations moves from LegacyMdrunOptions
to SimulationContext, which is more appropriate for
ensemble-parallelism established directly by the user.

Before the decision about the duty of a rank, there is no difference
between MASTER(cr) and SIMMASTER(cr), so several calls to macros
taking a t_commrec pointer are replaced by booleans. Introduced
findIsSimulationMasterRank to compute that value. This eliminates
early use of t_commrec that has necessitated other hacks and
workarounds.

Removed redundant check for replica exchange when the number of multi
simulations is less than two, because gmx_multisim_t constructor
already prohibits that.

Resolves several TODO items and improves modularity, too.

Refs #2587, #2605, #3081

Change-Id: I48bd3b713bc181b5c1e4cbcd648706a9f00eab96

20 files changed:
src/api/cpp/context.cpp
src/api/cpp/createsession.h
src/api/cpp/session.cpp
src/api/cpp/session_impl.h
src/gromacs/applied_forces/tests/electricfield.cpp
src/gromacs/gmxlib/network.cpp
src/gromacs/gmxlib/network.h
src/gromacs/hardware/printhardware.h
src/gromacs/mdrun/legacymdrunoptions.cpp
src/gromacs/mdrun/legacymdrunoptions.h
src/gromacs/mdrun/runner.cpp
src/gromacs/mdrun/runner.h
src/gromacs/mdrun/simulationcontext.cpp
src/gromacs/mdrun/simulationcontext.h
src/gromacs/mdrunutility/handlerestart.cpp
src/gromacs/mdrunutility/handlerestart.h
src/gromacs/mdrunutility/multisim.cpp
src/gromacs/mdrunutility/multisim.h
src/gromacs/tools/pme_error.cpp
src/programs/mdrun/mdrun.cpp

index a2fccfc99ebb1439902df226f67ae18041eefe37..fd3b4e5e623c7dbacdafa7038dede3dd17ad5dc6 100644 (file)
 #include "gromacs/commandline/pargs.h"
 #include "gromacs/commandline/filenm.h"
 #include "gromacs/commandline/pargs.h"
-#include "gromacs/gmxlib/network.h"
 #include "gromacs/mdlib/stophandler.h"
 #include "gromacs/mdrunutility/logging.h"
 #include "gromacs/mdrunutility/multisim.h"
 #include "gromacs/mdrun/runner.h"
 #include "gromacs/mdrunutility/handlerestart.h"
-#include "gromacs/mdtypes/commrec.h"
 #include "gromacs/utility/arraysize.h"
 #include "gromacs/utility/fatalerror.h"
 #include "gromacs/utility/smalloc.h"
@@ -150,17 +148,7 @@ std::shared_ptr<Session> ContextImpl::launch(const Workflow &work)
             strcpy(argv[argvIndex], mdArg.c_str());
         }
 
-        auto mdModules = std::make_unique<MDModules>();
-
-        // pointer-to-t_commrec is the de facto handle type for communications record.
-        // Complete shared / borrowed ownership requires a reference to this stack variable
-        // (or pointer-to-pointer-to-t_commrec) since borrowing code may update the pointer.
-        // \todo Define the ownership and lifetime management semantics for a communication record, handle or value type.
-
-        // Note: this communications record initialization acts directly on
-        // MPI_COMM_WORLD and is incompatible with MPI environment sharing in
-        // gmxapi through 0.0.7, at least.
-        options_.cr = init_commrec();
+        auto        mdModules = std::make_unique<MDModules>();
 
         const char *desc[]  = {"gmxapi placeholder text"};
         if (options_.updateFromCommandLine(argc, argv.data(), desc) == 0)
@@ -168,17 +156,32 @@ std::shared_ptr<Session> ContextImpl::launch(const Workflow &work)
             return nullptr;
         }
 
-        StartingBehavior startingBehavior = StartingBehavior::NewSimulation;
-        LogFilePtr       logFileGuard     = nullptr;
+        ArrayRef<const std::string> multiSimDirectoryNames =
+            opt2fnsIfOptionSet("-multidir", ssize(options_.filenames), options_.filenames.data());
+        // Set up the communicator, where possible (see docs for
+        // SimulationContext).
+        MPI_Comm          communicator = GMX_LIB_MPI ? MPI_COMM_WORLD : MPI_COMM_NULL;
+        // The SimulationContext is necessary with gmxapi so that
+        // resources owned by the client code can have suitable
+        // lifetime. The gmx wrapper binary uses the same infrastructure,
+        // but the lifetime is now trivially that of the invocation of the
+        // wrapper binary.
+        SimulationContext simulationContext(communicator, multiSimDirectoryNames);
+
+
+        StartingBehavior  startingBehavior = StartingBehavior::NewSimulation;
+        LogFilePtr        logFileGuard     = nullptr;
+        gmx_multisim_t   *ms               = simulationContext.multiSimulation_.get();
         std::tie(startingBehavior,
-                 logFileGuard) = handleRestart(options_.cr.get(),
-                                               options_.ms.get(),
+                 logFileGuard) = handleRestart(findIsSimulationMasterRank(ms, communicator),
+                                               communicator,
+                                               ms,
                                                options_.mdrunOptions.appendingBehavior,
                                                ssize(options_.filenames),
                                                options_.filenames.data());
-        auto simulationContext = createSimulationContext(std::move(options_.cr));
 
-        auto builder = MdrunnerBuilder(std::move(mdModules));
+        auto builder = MdrunnerBuilder(std::move(mdModules),
+                                       compat::not_null<SimulationContext*>(&simulationContext));
         builder.addSimulationMethod(options_.mdrunOptions, options_.pforce, startingBehavior);
         builder.addDomainDecomposition(options_.domdecOptions);
         // \todo pass by value
@@ -189,9 +192,6 @@ std::shared_ptr<Session> ContextImpl::launch(const Workflow &work)
         builder.addUpdateTaskAssignment(options_.update_opt_choices[0]);
         builder.addNeighborList(options_.nstlist_cmdline);
         builder.addReplicaExchange(options_.replExParams);
-        // \todo take ownership of multisim resources (ms)
-        builder.addMultiSim(options_.ms.get());
-        // \todo Provide parallelism resources through SimulationContext.
         // Need to establish run-time values from various inputs to provide a resource handle to Mdrunner
         builder.addHardwareOptions(options_.hw_opt);
         // \todo File names are parameters that should be managed modularly through further factoring.
@@ -206,8 +206,7 @@ std::shared_ptr<Session> ContextImpl::launch(const Workflow &work)
         launchedSession = createSession(shared_from_this(),
                                         std::move(builder),
                                         std::move(simulationContext),
-                                        std::move(logFileGuard),
-                                        options_.ms.get());
+                                        std::move(logFileGuard));
 
         // Clean up argv once builder is no longer in use
         for (auto && string : argv)
index 0d44662be5ec9b5b8aa4255392fa0166278fb0a8..487d0652683f66ec5316d77f6188b49ef3573bd4 100644 (file)
@@ -46,8 +46,6 @@
 #include "gmxapi/context.h"
 #include "gmxapi/session.h"
 
-struct gmx_multisim_t;
-
 namespace gmx
 {
 class MdrunnerBuilder;
@@ -73,8 +71,7 @@ namespace gmxapi
 std::shared_ptr<Session> createSession(std::shared_ptr<ContextImpl>  context,
                                        gmx::MdrunnerBuilder        &&runnerBuilder,
                                        gmx::SimulationContext      &&simulationContext,
-                                       gmx::LogFilePtr               logFilehandle,
-                                       gmx_multisim_t              * multiSim);
+                                       gmx::LogFilePtr               logFilehandle);
 
 
 }      // end namespace gmxapi
index f0089d4bb391c553be3d7be561353e6c65a1bb48..a2cae740cbd46d00f7c042f04aa258ff9ff176f1 100644 (file)
 #include "gmxapi/session.h"
 
 #include <memory>
-#include "gromacs/gmxlib/network.h"
 #include "gromacs/mdlib/sighandler.h"
 #include "gromacs/mdrunutility/logging.h"
-#include "gromacs/mdrunutility/multisim.h"
 #include "gromacs/restraint/restraintpotential.h"
 #include "gromacs/utility/gmxassert.h"
 #include "gromacs/utility/basenetwork.h"
@@ -65,14 +63,17 @@ namespace gmxapi
 /*!
  * \brief Provide RAII management of communications resource state.
  *
- * To acquire an MpiContextManager is to have assurance that the GROMACS MPI
+ * To acquire an MpiContextManager is to have assurance that any external MPI
  * environment is ready to use. When the MpiContextManager is released or
  * goes out of scope, the destructor finalizes the resources.
  *
- * \todo Figure out how to manage MPI versus tMPI.
- * \todo gmx::init should take a subcommunicator rather than use MPI_COMM_WORLD
+ * Note that thread-MPI chooses the number of ranks and constructs its
+ * MPI communicator internally, so does not and is unlikely to ever
+ * participate here.
+ *
  * \todo There is no resource for logging or reporting errors during initialization
- * \todo Clarify relationship with gmx::SimulationContext.
+ * \todo Remove this class by managing the MPI context with mpi4py and so
+ *       configuring the SimulationContext externally
  *
  * \ingroup gmxapi
  */
@@ -82,27 +83,14 @@ class MpiContextManager
         MpiContextManager()
         {
             gmx::init(nullptr, nullptr);
-#ifdef GMX_MPI
-#if GMX_MPI
-#if GMX_THREAD_MPI
-            // With thread-MPI, gmx_mpi_initialized() is false until after
-            // spawnThreads in the middle of gmx::Mdrunner::mdrunner(), but
-            // without thread-MPI, MPI is initialized by the time gmx::init()
-            // returns. In other words, this is not an effective context manager
-            // for thread-MPI, but it should be effective for MPI.
-            // \todo Distinguish scope / lifetime for comm resources from implementation details.
-            // \todo Normalize scope / lifetime of comm resources.
-#else
-            GMX_ASSERT(gmx_mpi_initialized(), "MPI should be initialized before reaching this point.");
-#endif // GMX_THREAD_MPI
-#endif // GMX_MPI
-#endif // defined(GMX_MPI)
+            GMX_RELEASE_ASSERT(!GMX_LIB_MPI || gmx_mpi_initialized(), "MPI should be initialized before reaching this point.");
         };
 
         ~MpiContextManager()
         {
-            // This is safe to call. It is a no-op if thread-MPI, and if the
-            // constructor completed then MPI is initialized.
+            // This is always safe to call. It is a no-op if
+            // thread-MPI, and if the constructor completed then the
+            // MPI library is initialized with reference counting.
             gmx::finalize();
         }
 
@@ -188,33 +176,27 @@ Status SessionImpl::run() noexcept
 std::unique_ptr<SessionImpl> SessionImpl::create(std::shared_ptr<ContextImpl>  context,
                                                  gmx::MdrunnerBuilder        &&runnerBuilder,
                                                  gmx::SimulationContext      &&simulationContext,
-                                                 gmx::LogFilePtr               logFilehandle,
-                                                 gmx_multisim_t              * multiSim)
+                                                 gmx::LogFilePtr               logFilehandle)
 {
     // We should be able to get a communicator (or subcommunicator) through the
     // Context.
     return std::make_unique<SessionImpl>(std::move(context),
                                          std::move(runnerBuilder),
                                          std::move(simulationContext),
-                                         std::move(logFilehandle),
-                                         multiSim);
+                                         std::move(logFilehandle));
 }
 
 SessionImpl::SessionImpl(std::shared_ptr<ContextImpl>  context,
                          gmx::MdrunnerBuilder        &&runnerBuilder,
                          gmx::SimulationContext      &&simulationContext,
-                         gmx::LogFilePtr               fplog,
-                         gmx_multisim_t              * multiSim) :
+                         gmx::LogFilePtr               fplog) :
     context_(std::move(context)),
     mpiContextManager_(std::make_unique<MpiContextManager>()),
     simulationContext_(std::move(simulationContext)),
-    logFilePtr_(std::move(fplog)),
-    multiSim_(multiSim)
+    logFilePtr_(std::move(fplog))
 {
     GMX_ASSERT(context_, "SessionImpl invariant implies valid ContextImpl handle.");
     GMX_ASSERT(mpiContextManager_, "SessionImpl invariant implies valid MpiContextManager guard.");
-    GMX_ASSERT(simulationContext_.communicationRecord_, "SessionImpl invariant implies valid commrec.");
-    GMX_UNUSED_VALUE(multiSim_);
 
     // \todo Session objects can have logic specialized for the runtime environment.
 
@@ -222,7 +204,6 @@ SessionImpl::SessionImpl(std::shared_ptr<ContextImpl>  context,
     signalManager_ = std::make_unique<SignalManager>(stopHandlerBuilder.get());
     GMX_ASSERT(signalManager_, "SessionImpl invariant includes a valid SignalManager.");
 
-    runnerBuilder.addCommunicationRecord(simulationContext_.communicationRecord_.get());
     runnerBuilder.addStopHandlerBuilder(std::move(stopHandlerBuilder));
     runner_ = std::make_unique<gmx::Mdrunner>(runnerBuilder.build());
     GMX_ASSERT(runner_, "SessionImpl invariant implies valid Mdrunner handle.");
@@ -235,14 +216,12 @@ SessionImpl::SessionImpl(std::shared_ptr<ContextImpl>  context,
 std::shared_ptr<Session> createSession(std::shared_ptr<ContextImpl>  context,
                                        gmx::MdrunnerBuilder        &&runnerBuilder,
                                        gmx::SimulationContext      &&simulationContext,
-                                       gmx::LogFilePtr               logFilehandle,
-                                       gmx_multisim_t              * multiSim)
+                                       gmx::LogFilePtr               logFilehandle)
 {
     auto newSession = SessionImpl::create(std::move(context),
                                           std::move(runnerBuilder),
                                           std::move(simulationContext),
-                                          std::move(logFilehandle),
-                                          multiSim);
+                                          std::move(logFilehandle));
     auto launchedSession = std::make_shared<Session>(std::move(newSession));
     return launchedSession;
 }
index 1ab8716beeeea67bdb3628e37a5d5f0749b0802a..764d153fcf358f7bf9e4926c8ec44f22d693a1b4 100644 (file)
@@ -114,7 +114,6 @@ class SessionImpl
          * \param runnerBuilder MD simulation builder to take ownership of.
          * \param simulationContext Take ownership of the simulation resources.
          * \param logFilehandle Take ownership of filehandle for MD logging
-         * \param multiSim Take ownership of resources for Mdrunner multi-sim.
          *
          * \todo Log file management will be updated soon.
          *
@@ -123,8 +122,7 @@ class SessionImpl
         static std::unique_ptr<SessionImpl> create(std::shared_ptr<ContextImpl>  context,
                                                    gmx::MdrunnerBuilder        &&runnerBuilder,
                                                    gmx::SimulationContext      &&simulationContext,
-                                                   gmx::LogFilePtr               logFilehandle,
-                                                   gmx_multisim_t              * multiSim);
+                                                   gmx::LogFilePtr               logFilehandle);
 
         /*!
          * \brief Add a restraint to the simulation.
@@ -184,14 +182,12 @@ class SessionImpl
          * \param runnerBuilder ownership of the MdrunnerBuilder object.
          * \param simulationContext take ownership of a SimulationContext
          * \param logFilehandle Take ownership of filehandle for MD logging
-         * \param multiSim Take ownership of resources for Mdrunner multi-sim.
          *
          */
         SessionImpl(std::shared_ptr<ContextImpl>  context,
                     gmx::MdrunnerBuilder        &&runnerBuilder,
                     gmx::SimulationContext      &&simulationContext,
-                    gmx::LogFilePtr               logFilehandle,
-                    gmx_multisim_t              * multiSim);
+                    gmx::LogFilePtr               logFilehandle);
 
     private:
         /*!
@@ -234,17 +230,6 @@ class SessionImpl
          */
         gmx::LogFilePtr logFilePtr_;
 
-        /*!
-         * \brief MultiSim resources for Mdrunner instance.
-         *
-         * May be null for no multi-simulation management at the Mdrunner level.
-         *
-         * \todo Lifetime of the multi-simulation handle is currently
-         * managed by LegacyMdrunOptions, but in the long term,
-         * session needs to manage it.
-         */
-        gmx_multisim_t* multiSim_;
-
         /*!
          * \brief Own and manager the signalling pathways for the current session.
          *
index 56726d19b77c1653976895c3d02dea9672134fc6..56c7093281c2ccab35a3a5c031500250c46d493d 100644 (file)
@@ -115,7 +115,7 @@ class ElectricFieldTest : public ::testing::Test
             std::vector<real> chargeA { 1 };
             md.homenr  = ssize(chargeA);
             md.chargeA = chargeA.data();
-            CommrecHandle        cr       = init_commrec();
+            CommrecHandle        cr       = init_commrec(MPI_COMM_WORLD, nullptr);
             matrix               boxDummy = { {0, 0, 0}, {0, 0, 0}, {0, 0, 0} };
             ForceProviderInput   forceProviderInput({}, md, 0.0, boxDummy, *cr);
 
index f185288c7d18cf81cde61d0840d317bd70333dd3..cc5dbd573d17ee0c6e87595a993023707e36112c 100644 (file)
@@ -46,6 +46,7 @@
 #include <cstring>
 
 #include "gromacs/commandline/filenm.h"
+#include "gromacs/mdrunutility/multisim.h"
 #include "gromacs/mdtypes/commrec.h"
 #include "gromacs/utility/basenetwork.h"
 #include "gromacs/utility/cstringutil.h"
 /* The source code in this file should be thread-safe.
       Please keep it that way. */
 
-void gmx_fill_commrec_from_mpi(t_commrec *cr)
-{
-#if !GMX_MPI
-    gmx_call("gmx_fill_commrec_from_mpi");
-    GMX_UNUSED_VALUE(cr);
-#else
-    if (!gmx_mpi_initialized())
-    {
-        gmx_comm("MPI has not been initialized properly");
-    }
-
-    cr->nnodes           = gmx_node_num();
-    cr->nodeid           = gmx_node_rank();
-    cr->sim_nodeid       = cr->nodeid;
-    cr->mpi_comm_mysim   = MPI_COMM_WORLD;
-    cr->mpi_comm_mygroup = MPI_COMM_WORLD;
-#endif
-}
-
-CommrecHandle init_commrec()
+CommrecHandle init_commrec(MPI_Comm              communicator,
+                           const gmx_multisim_t *ms)
 {
     CommrecHandle handle;
     t_commrec    *cr;
@@ -86,16 +69,38 @@ CommrecHandle init_commrec()
     snew(cr, 1);
     handle.reset(cr);
 
-#if GMX_LIB_MPI
-    gmx_fill_commrec_from_mpi(cr);
+    int rankInCommunicator, sizeOfCommunicator;
+#if GMX_MPI
+#  if GMX_LIB_MPI
+    GMX_RELEASE_ASSERT(gmx_mpi_initialized(), "Must have initialized MPI before building commrec");
+#  endif
+    MPI_Comm_rank(communicator, &rankInCommunicator);
+    MPI_Comm_size(communicator, &sizeOfCommunicator);
 #else
-    cr->mpi_comm_mysim   = MPI_COMM_NULL;
-    cr->mpi_comm_mygroup = MPI_COMM_NULL;
-    cr->nnodes           = 1;
-    cr->sim_nodeid       = 0;
-    cr->nodeid           = cr->sim_nodeid;
+    GMX_UNUSED_VALUE(communicator);
+    rankInCommunicator = 0;
+    sizeOfCommunicator = 1;
 #endif
 
+    if (ms != nullptr)
+    {
+#if GMX_MPI
+        cr->nnodes = sizeOfCommunicator / ms->nsim;
+        MPI_Comm_split(communicator, ms->sim, rankInCommunicator, &cr->mpi_comm_mysim);
+        cr->mpi_comm_mygroup = cr->mpi_comm_mysim;
+        MPI_Comm_rank(cr->mpi_comm_mysim, &cr->sim_nodeid);
+        MPI_Comm_rank(cr->mpi_comm_mygroup, &cr->nodeid);
+#endif
+    }
+    else
+    {
+        cr->nnodes           = sizeOfCommunicator;
+        cr->nodeid           = rankInCommunicator;
+        cr->sim_nodeid       = cr->nodeid;
+        cr->mpi_comm_mysim   = communicator;
+        cr->mpi_comm_mygroup = communicator;
+    }
+
     // TODO cr->duty should not be initialized here
     cr->duty = (DUTY_PP | DUTY_PME);
 
@@ -148,30 +153,6 @@ void done_commrec(t_commrec *cr)
     sfree(cr);
 }
 
-t_commrec *reinitialize_commrec_for_this_thread(const t_commrec *cro)
-{
-#if GMX_THREAD_MPI
-    t_commrec *cr;
-
-    /* make a thread-specific commrec */
-    snew(cr, 1);
-    /* now copy the whole thing, so settings like the number of PME nodes
-       get propagated. */
-    *cr = *cro;
-
-    /* and we start setting our own thread-specific values for things */
-    gmx_fill_commrec_from_mpi(cr);
-
-    // TODO cr->duty should not be initialized here
-    cr->duty             = (DUTY_PP | DUTY_PME);
-
-    return cr;
-#else
-    GMX_UNUSED_VALUE(cro);
-    return nullptr;
-#endif
-}
-
 void gmx_setup_nodecomm(FILE gmx_unused *fplog, t_commrec *cr)
 {
     gmx_nodecomm_t *nc;
index 086f2751b5508a39710b9582a99fc21089328a57..6d931f60de3257e759ae4303ed06cca8cc5d6b4d 100644 (file)
@@ -48,6 +48,7 @@
 #include "gromacs/utility/stringutil.h"
 #include "gromacs/utility/unique_cptr.h"
 
+struct gmx_multisim_t;
 struct t_commrec;
 struct t_filenm;
 
@@ -58,7 +59,8 @@ void done_commrec(t_commrec *cr);
 using CommrecHandle = gmx::unique_cptr<t_commrec, done_commrec>;
 
 //! Allocate, initialize and return the commrec.
-CommrecHandle init_commrec();
+CommrecHandle init_commrec(MPI_Comm              communicator,
+                           const gmx_multisim_t *ms);
 
 struct t_commrec *reinitialize_commrec_for_this_thread(const t_commrec *cro);
 
index a525c75ae9212f944abd4ee382e1a0ec2e1a8513..17c27c07bba8a3e6216db8184bb92f3c35697203 100644 (file)
@@ -45,7 +45,7 @@ class MDLogger;
 }
 
 /* Print information about the detected hardware to fplog (if != NULL)
- * and to stderr the master rank.
+ * and to stderr on the master rank of the master simulation.
  */
 void gmx_print_detected_hardware(FILE                *fplog,
                                  bool                 warnToStdErr,
index 6381102e69101145799a25fcd5b984122aa1f93e..2e4b88f3a0bfd31e3bc1efc9aef77e00c054ceec 100644 (file)
 
 #include "legacymdrunoptions.h"
 
-#include "config.h"
-
 #include <cstring>
 
-#include "gromacs/gmxlib/network.h"
 #include "gromacs/math/functions.h"
-#include "gromacs/mdrunutility/handlerestart.h"
-#include "gromacs/mdrunutility/multisim.h"
-#include "gromacs/mdtypes/commrec.h"
 #include "gromacs/utility/arraysize.h"
 #include "gromacs/utility/fatalerror.h"
 
@@ -138,29 +132,6 @@ int LegacyMdrunOptions::updateFromCommandLine(int argc, char **argv, ArrayRef<co
 
     hw_opt.threadAffinity = static_cast<ThreadAffinity>(nenum(thread_aff_opt_choices));
 
-    // now check for a multi-simulation
-    ArrayRef<const std::string> multidir = opt2fnsIfOptionSet("-multidir",
-                                                              ssize(filenames),
-                                                              filenames.data());
-    if (!multidir.empty())
-    {
-        ms = std::make_unique<gmx_multisim_t>(MPI_COMM_WORLD, multidir);
-    }
-
-    /* Prepare the intra-simulation communication */
-    // TODO consolidate this with init_commrec, after changing the
-    // relative ordering of init_commrec and ms initialization.
-    if (ms != nullptr)
-    {
-#if GMX_MPI
-        cr->nnodes = cr->nnodes / ms->nsim;
-        MPI_Comm_split(MPI_COMM_WORLD, ms->sim, cr->sim_nodeid, &cr->mpi_comm_mysim);
-        cr->mpi_comm_mygroup = cr->mpi_comm_mysim;
-        MPI_Comm_rank(cr->mpi_comm_mysim, &cr->sim_nodeid);
-        MPI_Comm_rank(cr->mpi_comm_mygroup, &cr->nodeid);
-#endif
-    }
-
     if (!opt2parg_bSet("-append", asize(pa), pa))
     {
         mdrunOptions.appendingBehavior = AppendingBehavior::Auto;
index 354fb30cbcd832f6b3ebf0f27a8683c5639c8e97..d5e47c872b47a03a0a1c212efc27b2ea8e1cb404 100644 (file)
 #include "gromacs/commandline/filenm.h"
 #include "gromacs/commandline/pargs.h"
 #include "gromacs/domdec/options.h"
-#include "gromacs/gmxlib/network.h"
 #include "gromacs/hardware/hw_info.h"
-#include "gromacs/mdrunutility/multisim.h"
 #include "gromacs/mdtypes/mdrunoptions.h"
 
 #include "replicaexchange.h"
 
-struct gmx_multisim_t;
-
 namespace gmx
 {
 
@@ -76,8 +72,8 @@ namespace gmx
  * that all of these declarations and defaults are local to the
  * modules.
  *
- * \todo Contextual aspects, such as working directory, MPI
- * environment, and environment variable handling are more properly
+ * \todo Contextual aspects, such as working directory
+ * and environment variable handling are more properly
  * the role of SimulationContext, and should be moved there */
 class LegacyMdrunOptions
 {
@@ -272,11 +268,6 @@ class LegacyMdrunOptions
         };
         /*! \} */
 
-        //! Handle to communication object.
-        CommrecHandle                   cr = init_commrec();
-        //! Multi-simulation object.
-        std::unique_ptr<gmx_multisim_t> ms;
-
         //! Parses the command-line input and prepares to start mdrun.
         int updateFromCommandLine(int argc, char **argv, ArrayRef<const char *> desc);
 
index cacf9fb7ef81ce922703eae5a11cafd490093ae2..00f8b7f79fae2f36c58639e107416e0b50d82799 100644 (file)
@@ -230,9 +230,6 @@ Mdrunner Mdrunner::cloneOnSpawnedThread() const
         newRunner.restraintManager_ = std::make_unique<RestraintManager>(*restraintManager_);
     }
 
-    // Copy original cr pointer before master thread can pass the thread barrier
-    newRunner.cr  = reinitialize_commrec_for_this_thread(cr);
-
     // Copy members of master runner.
     // \todo Replace with builder when Simulation context and/or runner phases are better defined.
     // Ref https://redmine.gromacs.org/issues/2587 and https://redmine.gromacs.org/issues/2375
@@ -250,14 +247,15 @@ Mdrunner Mdrunner::cloneOnSpawnedThread() const
     newRunner.nstlist_cmdline     = nstlist_cmdline;
     newRunner.replExParams        = replExParams;
     newRunner.pforce              = pforce;
+    // Give the spawned thread the newly created valid communicator
+    // for the simulation.
+    newRunner.communicator        = MPI_COMM_WORLD;
     newRunner.ms                  = ms;
     newRunner.startingBehavior    = startingBehavior;
     newRunner.stopHandlerBuilder_ = std::make_unique<StopHandlerBuilder>(*stopHandlerBuilder_);
 
     threadMpiMdrunnerAccessBarrier();
 
-    GMX_RELEASE_ASSERT(!MASTER(newRunner.cr), "cloneOnSpawnedThread should only be called on spawned threads");
-
     return newRunner;
 }
 
@@ -282,21 +280,8 @@ static void mdrunner_start_fn(const void *arg)
 }
 
 
-/*! \brief Start thread-MPI threads.
- *
- * Called by mdrunner() to start a specific number of threads
- * (including the main thread) for thread-parallel runs. This in turn
- * calls mdrunner() for each thread. All options are the same as for
- * mdrunner(). */
-t_commrec *Mdrunner::spawnThreads(int numThreadsToLaunch) const
+void Mdrunner::spawnThreads(int numThreadsToLaunch)
 {
-
-    /* first check whether we even need to start tMPI */
-    if (numThreadsToLaunch < 2)
-    {
-        return cr;
-    }
-
 #if GMX_THREAD_MPI
     /* now spawn new threads that start mdrunner_start_fn(), while
        the main thread returns. Thread affinity is handled later. */
@@ -306,12 +291,14 @@ t_commrec *Mdrunner::spawnThreads(int numThreadsToLaunch) const
         GMX_THROW(gmx::InternalError("Failed to spawn thread-MPI threads"));
     }
 
+    // Give the master thread the newly created valid communicator for
+    // the simulation.
+    communicator = MPI_COMM_WORLD;
     threadMpiMdrunnerAccessBarrier();
 #else
+    GMX_UNUSED_VALUE(numThreadsToLaunch);
     GMX_UNUSED_VALUE(mdrunner_start_fn);
 #endif
-
-    return reinitialize_commrec_for_this_thread(cr);
 }
 
 }  // namespace gmx
@@ -449,14 +436,15 @@ static bool gpuAccelerationOfNonbondedIsUseful(const MDLogger   &mdlog,
 }
 
 //! Initializes the logger for mdrun.
-static gmx::LoggerOwner buildLogger(FILE *fplog, const t_commrec *cr)
+static gmx::LoggerOwner buildLogger(FILE      *fplog,
+                                    const bool isSimulationMasterRank)
 {
     gmx::LoggerBuilder builder;
     if (fplog != nullptr)
     {
         builder.addTargetFile(gmx::MDLogger::LogLevel::Info, fplog);
     }
-    if (cr == nullptr || SIMMASTER(cr))
+    if (isSimulationMasterRank)
     {
         builder.addTargetStream(gmx::MDLogger::LogLevel::Warning,
                                 &gmx::TextOutputFile::standardError());
@@ -661,10 +649,7 @@ int Mdrunner::mdrunner()
     auto       updateTarget    = findTaskTarget(update_opt);
     PmeRunMode pmeRunMode      = PmeRunMode::None;
 
-    // Here we assume that SIMMASTER(cr) does not change even after the
-    // threads are started.
-
-    FILE *fplog = nullptr;
+    FILE      *fplog = nullptr;
     // If we are appending, we don't write log output because we need
     // to check that the old log file matches what the checkpoint file
     // expects. Otherwise, we should start to write log output now if
@@ -673,19 +658,25 @@ int Mdrunner::mdrunner()
     {
         fplog = gmx_fio_getfp(logFileHandle);
     }
-    gmx::LoggerOwner logOwner(buildLogger(fplog, cr));
+    const bool       isSimulationMasterRank = findIsSimulationMasterRank(ms, communicator);
+    gmx::LoggerOwner logOwner(buildLogger(fplog, isSimulationMasterRank));
     gmx::MDLogger    mdlog(logOwner.logger());
 
     // report any development features that may be enabled by environment variables
     manageDevelopmentFeatures(mdlog);
 
-    // With thread-MPI, the communicator changes after threads are
-    // launched, so this is rebuilt for the master rank at that
-    // time. The non-master ranks are fine to keep the one made here.
-    PhysicalNodeCommunicator physicalNodeComm(MPI_COMM_WORLD, gmx_physicalnode_id_hash());
+    // TODO The thread-MPI master rank makes a working
+    // PhysicalNodeCommunicator here, but it gets rebuilt by all ranks
+    // after the threads have been launched. This works because no use
+    // is made of that communicator until after the execution paths
+    // have rejoined. But it is likely that we can improve the way
+    // this is expressed, e.g. by expressly running detection only the
+    // master rank for thread-MPI, rather than relying on the mutex
+    // and reference count.
+    PhysicalNodeCommunicator physicalNodeComm(communicator, gmx_physicalnode_id_hash());
     hwinfo = gmx_detect_hardware(mdlog, physicalNodeComm);
 
-    gmx_print_detected_hardware(fplog, isMasterSimMasterRank(ms, MASTER(cr)), mdlog, hwinfo);
+    gmx_print_detected_hardware(fplog, isSimulationMasterRank && isMasterSim(ms), mdlog, hwinfo);
 
     std::vector<int> gpuIdsToUse = makeGpuIdsToUse(hwinfo->gpu_info, hw_opt.gpuIdsAvailable);
 
@@ -699,7 +690,7 @@ int Mdrunner::mdrunner()
 
     auto                     partialDeserializedTpr = std::make_unique<PartialDeserializedTprFile>();
 
-    if (SIMMASTER(cr))
+    if (isSimulationMasterRank)
     {
         /* Only the master rank has the global state */
         globalState = std::make_unique<t_state>();
@@ -712,9 +703,9 @@ int Mdrunner::mdrunner()
     }
 
     /* Check and update the hardware options for internal consistency */
-    checkAndUpdateHardwareOptions(mdlog, &hw_opt, SIMMASTER(cr), domdecOptions.numPmeRanks, inputrec);
+    checkAndUpdateHardwareOptions(mdlog, &hw_opt, isSimulationMasterRank, domdecOptions.numPmeRanks, inputrec);
 
-    if (GMX_THREAD_MPI && SIMMASTER(cr))
+    if (GMX_THREAD_MPI && isSimulationMasterRank)
     {
         bool useGpuForNonbonded = false;
         bool useGpuForPme       = false;
@@ -753,20 +744,21 @@ int Mdrunner::mdrunner()
                                                 doMembed);
 
         // Now start the threads for thread MPI.
-        cr = spawnThreads(hw_opt.nthreads_tmpi);
-        /* The main thread continues here with a new cr. We don't deallocate
-           the old cr because other threads may still be reading it. */
-        // TODO Both master and spawned threads call dup_tfn and
-        // reinitialize_commrec_for_this_thread. Find a way to express
-        // this better.
-        physicalNodeComm = PhysicalNodeCommunicator(MPI_COMM_WORLD, gmx_physicalnode_id_hash());
+        spawnThreads(hw_opt.nthreads_tmpi);
+        // The spawned threads enter mdrunner() and execution of
+        // master and spawned threads joins at the end of this block.
+        physicalNodeComm = PhysicalNodeCommunicator(communicator, gmx_physicalnode_id_hash());
     }
-    // END OF CAUTION: cr and physicalNodeComm are now reliable
+
+    GMX_RELEASE_ASSERT(communicator == MPI_COMM_WORLD, "Must have valid world communicator");
+    CommrecHandle crHandle = init_commrec(communicator, ms);
+    t_commrec    *cr       = crHandle.get();
+    GMX_RELEASE_ASSERT(cr != nullptr, "Must have valid commrec");
 
     if (PAR(cr))
     {
         /* now broadcast everything to the non-master nodes/threads: */
-        if (!SIMMASTER(cr))
+        if (!isSimulationMasterRank)
         {
             inputrec = &inputrecInstance;
         }
@@ -775,10 +767,9 @@ int Mdrunner::mdrunner()
     GMX_RELEASE_ASSERT(inputrec != nullptr, "All ranks should have a valid inputrec now");
     partialDeserializedTpr.reset(nullptr);
 
-    // Now each rank knows the inputrec that SIMMASTER read and used,
-    // and (if applicable) cr->nnodes has been assigned the number of
-    // thread-MPI ranks that have been chosen. The ranks can now all
-    // run the task-deciding functions and will agree on the result
+    // Now the number of ranks is known to all ranks, and each knows
+    // the inputrec read by the master rank. The ranks can now all run
+    // the task-deciding functions and will agree on the result
     // without needing to communicate.
     //
     // TODO Should we do the communication in debug mode to support
@@ -1000,7 +991,7 @@ int Mdrunner::mdrunner()
             // Now we can start normal logging to the truncated log file.
             fplog    = gmx_fio_getfp(logFileHandle);
             prepareLogAppending(fplog);
-            logOwner = buildLogger(fplog, cr);
+            logOwner = buildLogger(fplog, MASTER(cr));
             mdlog    = logOwner.logger();
         }
     }
@@ -1673,7 +1664,8 @@ class Mdrunner::BuilderImplementation
 {
     public:
         BuilderImplementation() = delete;
-        BuilderImplementation(std::unique_ptr<MDModules> mdModules);
+        BuilderImplementation(std::unique_ptr<MDModules>           mdModules,
+                              compat::not_null<SimulationContext*> context);
         ~BuilderImplementation();
 
         BuilderImplementation &setExtraMdrunOptions(const MdrunOptions &options,
@@ -1686,10 +1678,6 @@ class Mdrunner::BuilderImplementation
 
         void addReplicaExchange(const ReplicaExchangeParameters &params);
 
-        void addMultiSim(gmx_multisim_t* multisim);
-
-        void addCommunicationRecord(t_commrec *cr);
-
         void addNonBonded(const char* nbpu_opt);
 
         void addPME(const char* pme_opt_, const char* pme_fft_opt_);
@@ -1731,10 +1719,10 @@ class Mdrunner::BuilderImplementation
         int         nstlist_ = 0;
 
         //! Multisim communicator handle.
-        std::unique_ptr<gmx_multisim_t*>      multisim_ = nullptr;
+        gmx_multisim_t *multiSimulation_;
 
-        //! Non-owning communication record (only used when building command-line mdrun)
-        t_commrec *communicationRecord_ = nullptr;
+        //! mdrun communicator
+        MPI_Comm communicator_ = MPI_COMM_NULL;
 
         //! Print a warning if any force is larger than this (in kJ/mol nm).
         real forceWarningThreshold_ = -1;
@@ -1773,9 +1761,12 @@ class Mdrunner::BuilderImplementation
         std::unique_ptr<StopHandlerBuilder> stopHandlerBuilder_ = nullptr;
 };
 
-Mdrunner::BuilderImplementation::BuilderImplementation(std::unique_ptr<MDModules> mdModules) :
+Mdrunner::BuilderImplementation::BuilderImplementation(std::unique_ptr<MDModules>           mdModules,
+                                                       compat::not_null<SimulationContext*> context) :
     mdModules_(std::move(mdModules))
 {
+    communicator_    = context->communicator_;
+    multiSimulation_ = context->multiSimulation_.get();
 }
 
 Mdrunner::BuilderImplementation::~BuilderImplementation() = default;
@@ -1806,16 +1797,6 @@ void Mdrunner::BuilderImplementation::addReplicaExchange(const ReplicaExchangePa
     replicaExchangeParameters_ = params;
 }
 
-void Mdrunner::BuilderImplementation::addMultiSim(gmx_multisim_t* multisim)
-{
-    multisim_ = std::make_unique<gmx_multisim_t*>(multisim);
-}
-
-void Mdrunner::BuilderImplementation::addCommunicationRecord(t_commrec *cr)
-{
-    communicationRecord_ = cr;
-}
-
 Mdrunner Mdrunner::BuilderImplementation::build()
 {
     auto newRunner = Mdrunner(std::move(mdModules_));
@@ -1835,18 +1816,10 @@ Mdrunner Mdrunner::BuilderImplementation::build()
 
     newRunner.filenames = filenames_;
 
-    GMX_ASSERT(communicationRecord_, "Bug found. It should not be possible to call build() without a valid communicationRecord_.");
-    newRunner.cr = communicationRecord_;
+    newRunner.communicator = communicator_;
 
-    if (multisim_)
-    {
-        // nullptr is a valid value for the multisim handle, so we don't check the pointed-to pointer.
-        newRunner.ms = *multisim_;
-    }
-    else
-    {
-        GMX_THROW(gmx::APIError("MdrunnerBuilder::addMultiSim() is required before build()"));
-    }
+    // nullptr is a valid value for the multisim handle
+    newRunner.ms = multiSimulation_;
 
     // \todo Clarify ownership and lifetime management for gmx_output_env_t
     // \todo Update sanity checking when output environment has clearly specified invariants.
@@ -1961,8 +1934,9 @@ void Mdrunner::BuilderImplementation::addStopHandlerBuilder(std::unique_ptr<Stop
     stopHandlerBuilder_ = std::move(builder);
 }
 
-MdrunnerBuilder::MdrunnerBuilder(std::unique_ptr<MDModules>           mdModules) :
-    impl_ {std::make_unique<Mdrunner::BuilderImplementation>(std::move(mdModules))}
+MdrunnerBuilder::MdrunnerBuilder(std::unique_ptr<MDModules>           mdModules,
+                                 compat::not_null<SimulationContext*> context) :
+    impl_ {std::make_unique<Mdrunner::BuilderImplementation>(std::move(mdModules), context)}
 {
 }
 
@@ -1994,18 +1968,6 @@ MdrunnerBuilder &MdrunnerBuilder::addReplicaExchange(const ReplicaExchangeParame
     return *this;
 }
 
-MdrunnerBuilder &MdrunnerBuilder::addMultiSim(gmx_multisim_t* multisim)
-{
-    impl_->addMultiSim(multisim);
-    return *this;
-}
-
-MdrunnerBuilder &MdrunnerBuilder::addCommunicationRecord(t_commrec *cr)
-{
-    impl_->addCommunicationRecord(cr);
-    return *this;
-}
-
 MdrunnerBuilder &MdrunnerBuilder::addNonBonded(const char* nbpu_opt)
 {
     impl_->addNonBonded(nbpu_opt);
index 5c7a9a1d38d6324bf3b3c1965a96fbc917a1b42a..9c642f279e32f09d5bd33464b3e37c3f9409d9ed 100644 (file)
@@ -48,6 +48,7 @@
 #include <memory>
 
 #include "gromacs/commandline/filenm.h"
+#include "gromacs/compat/pointers.h"
 #include "gromacs/domdec/options.h"
 #include "gromacs/hardware/hw_info.h"
 #include "gromacs/math/vec.h"
 #include "gromacs/mdrunutility/handlerestart.h"
 #include "gromacs/mdtypes/mdrunoptions.h"
 #include "gromacs/utility/basedefinitions.h"
+#include "gromacs/utility/gmxmpi.h"
 #include "gromacs/utility/real.h"
 
 #include "replicaexchange.h"
 
+struct gmx_multisim_t;
 struct gmx_output_env_t;
 struct ReplicaExchangeParameters;
-struct t_commrec;
 struct t_fileio;
 
 namespace gmx
@@ -167,8 +169,14 @@ class Mdrunner
         void addPotential(std::shared_ptr<IRestraintPotential> restraint,
                           const std::string                   &name);
 
-        //! Called when thread-MPI spawns threads.
-        t_commrec *spawnThreads(int numThreadsToLaunch) const;
+        /*! \brief Prepare the thread-MPI communicator to have \c
+         * numThreadsToLaunch ranks, by spawning new thread-MPI
+         * threads.
+         *
+         * Called by mdrunner() to start a specific number of threads
+         * (including the main thread) for thread-parallel runs. This
+         * in turn calls mdrunner() for each thread. */
+        void spawnThreads(int numThreadsToLaunch);
 
         /*! \brief Initializes a new Mdrunner from the master.
          *
@@ -256,8 +264,12 @@ class Mdrunner
         //! \brief Non-owning handle to file used for logging.
         t_fileio                               *logFileHandle = nullptr;
 
-        //! \brief Non-owning handle to communication data structure.
-        t_commrec                              *cr = nullptr;
+        /*! \brief Non-owning handle to communication data structure.
+         *
+         * With real MPI, gets a value from the SimulationContext
+         * supplied to the MdrunnerBuilder. With thread-MPI gets a
+         * value after threads have been spawned. */
+        MPI_Comm                                communicator = MPI_COMM_NULL;
 
         //! \brief Non-owning handle to multi-simulation handler.
         gmx_multisim_t                         *ms = nullptr;
@@ -339,13 +351,15 @@ class MdrunnerBuilder final
          * \brief Constructor requires a handle to a SimulationContext to share.
          *
          * \param mdModules  The handle to the set of modules active in mdrun
+         * \param context    Required handle to simulation context
          *
          * The calling code must guarantee that the
          * pointer remains valid for the lifetime of the builder, and that the
          * resources retrieved from the context remain valid for the lifetime of
          * the runner produced.
          */
-        explicit MdrunnerBuilder(std::unique_ptr<MDModules> mdModules);
+        explicit MdrunnerBuilder(std::unique_ptr<MDModules>           mdModules,
+                                 compat::not_null<SimulationContext*> context);
 
         //! \cond
         MdrunnerBuilder() = delete;
@@ -458,31 +472,6 @@ class MdrunnerBuilder final
          */
         MdrunnerBuilder &addUpdateTaskAssignment(const char *update_opt);
 
-        /*!
-         * \brief Provide access to the multisim communicator to use.
-         *
-         * \param multisim borrowed handle to multisim record.
-         *
-         * Required. Client should create a `gmx_multisim_t` and pass its address.
-         * Client is responsible for calling done_multisim() on the handle after
-         * simulation.
-         *
-         * \internal Pointer is copied, but calling code retains ownership and
-         * responsibility for multisim. Mdrunner must not do anything that would
-         * invalidate the original copied-from pointer.
-         *
-         * \todo Clarify semantics of specifying optional multisim work
-         * \todo Clarify ownership and management of multisim resources.
-         */
-        MdrunnerBuilder &addMultiSim(gmx_multisim_t* multisim);
-
-        /*!
-         * \brief Provide a commrec for mdrun to use.
-         *
-         * This will come from the simulation context, but differently
-         * for command-line gmx and a gmxapi Context. */
-        MdrunnerBuilder &addCommunicationRecord(t_commrec *cr);
-
         /*!
          * \brief Set MD options not owned by some other module.
          *
index fe3b37e2fb7163027a22f79ffdc239c617c9562b..a74d2744ce37d418bb90f7242d0902090b56f7e8 100644 (file)
 
 #include "simulationcontext.h"
 
-#include <cassert>
+#include "config.h"
 
-#include <memory>
+#include <cassert>
 
-#include "gromacs/commandline/filenm.h"
-#include "gromacs/domdec/domdec.h"
-#include "gromacs/fileio/oenv.h"
-#include "gromacs/hardware/hw_info.h"
-#include "gromacs/mdtypes/commrec.h"
+#include "gromacs/mdrunutility/multisim.h"
 #include "gromacs/utility/gmxassert.h"
+#include "gromacs/utility/gmxmpi.h"
 
 #include "runner.h"
 
 namespace gmx
 {
 //! \cond libapi
-SimulationContext::SimulationContext(CommrecHandle communicationRecord) :
-    communicationRecord_(std::move(communicationRecord))
+SimulationContext::SimulationContext(MPI_Comm                          communicator,
+                                     const ArrayRef<const std::string> multiSimDirectoryNames) :
+    communicator_(communicator),
+    multiSimulation_()
 {
-    // SimulationContext receives a valid, initialized communications record.
-    // \todo Initialization of the communications record is an implementation detail
-    // dependent on the build configuration of the GROMACS installation and on
-    // the run time computing context, not the client code that owns this Context.
-    GMX_RELEASE_ASSERT(communicationRecord_, "SimulationContext constructor needs initialized communicationRecord");
+    GMX_RELEASE_ASSERT((GMX_LIB_MPI && (communicator != MPI_COMM_NULL)) ||
+                       (!GMX_LIB_MPI && (communicator == MPI_COMM_NULL)),
+                       "With real MPI, a non-null communicator is required. "
+                       "Without it, the communicator must be null.");
+    if (!multiSimDirectoryNames.empty())
+    {
+        multiSimulation_ = std::make_unique<gmx_multisim_t>(communicator, multiSimDirectoryNames);
+    }
 }
 
-SimulationContext::SimulationContext(SimulationContext &&source) noexcept
-    : communicationRecord_(std::move(source.communicationRecord_))
-{
-}
-
-SimulationContext
-createSimulationContext(CommrecHandle simulationCommunicator)
-{
-    auto context = SimulationContext(std::move(simulationCommunicator));
-    return context;
-}
 //! \endcond
 } // end namespace gmx
index 66660652a9a8e7044c1d6d40b834fd313df38afc..1cb5b510365b391605f5540d30fda4f8f669477d 100644 (file)
  */
 
 #include <memory>
+#include <string>
 
-#include "gromacs/gmxlib/network.h"
+#include "gromacs/mdrunutility/multisim.h"
+#include "gromacs/utility/gmxmpi.h"
+
+struct gmx_multisim_t;
 
 namespace gmx
 {
 
+template <typename T> class ArrayRef;
+
 /*! \cond libapi
  * \libinternal
  * \brief Simulation environment and configuration.
@@ -62,6 +68,13 @@ namespace gmx
  * runtime parameters and resources from client code. The client retains ownership
  * of the context and its resources, with exceptions as noted.
  *
+ * A client must share ownership of some resources and transfer ownership of
+ * other resources to create or configure the context. The simulation may in
+ * turn consume or borrow some resources from the context. This is a new
+ * framework that will evolve in the contexts of
+ * https://redmine.gromacs.org/issues/2375
+ * https://redmine.gromacs.org/issues/2587
+ *
  * The public interface of SimulationContext is not yet well-specified.
  * Client code can create an instance with gmx::createSimulationContext()
  *
@@ -73,8 +86,7 @@ namespace gmx
  *
  * \internal
  * This is a minimal placeholder for a more complete implementation.
- * Interfaces for different API levels are not yet final, but also depend on
- * additional development of t_commrec and other resources.
+ * Interfaces for different API levels are not yet final.
  * \todo Impose sensible access restrictions.
  * Either the SimulationContext should be passed to the Mdrunner as logically constant or
  * a separate handle class can provide access to resources that have been
@@ -92,55 +104,44 @@ class SimulationContext final
          * \brief Object must be initialized with non-default constructor.
          */
         SimulationContext() = delete;
-
         /*!
-         * \brief Initializate with borrowed values.
+         * \brief Construct
          *
-         * \param communicationRecord non-owning communication record handle.
+         * \param communicator            MPI communicator for this (set of) simulations
+         * \param multiSimDirectoryNames  Names of any directories used with -multidir
+         */
+        explicit SimulationContext(MPI_Comm                    communicator,
+                                   ArrayRef<const std::string> multiSimDirectoryNames);
+
+        /*!
+         * \brief MPI communicator object for this simulation object.
          *
-         * Client code is responsible for cleaning up communicationRecord after
-         * SimulationContext is destroyed.
+         * With real MPI,
+         *   the gmx wrapper binary has called MPI_Init, thus
+         *     MPI_COMM_WORLD is now valid to use, and
+         *   (in future) the gmxapi runner will handle such details
+         *     (e.g. via mpi4py) before creating its SimulationContext.
+         * In both cases, if a multi-simulation is in use, then its
+         * communicator(s) are found in multiSimulation_. This
+         * communicator is that of all ranks from all simulations, and
+         * will later be split into one for each simulation.
+         * TODO Perhaps (for simplicity) that communicator splitting
+         * task can be undertaken during multi-sim setup.
          *
-         * \internal
-         * SimulationContext should be the owner of these objects and these implementation
-         * details are subject to change as ownership semantics are clarified in future
-         * development.
+         * With thread-MPI in both cases, the communicator is set up later
+         * during the process of spawning the threads that will be the MPI
+         * ranks. (Multi-simulation is not supported with thread-MPI.)
          */
-        explicit SimulationContext(CommrecHandle communicationRecord);
-        //! Move constructor
-        SimulationContext(SimulationContext && source) noexcept;
+        MPI_Comm communicator_ = MPI_COMM_NULL;
 
-        /*! \brief Communicator handle. */
-        CommrecHandle communicationRecord_;
+        /*!
+         * \brief Multi-sim handler (if required by e.g. gmx mdrun
+         * -multidir; only supported with real MPI)
+         */
+        std::unique_ptr<gmx_multisim_t> multiSimulation_;
 };
 //! \endcond
 
-/*! \cond libapi
- * \brief Get ownership of a new SimulationContext object.
- *
- * A client must share ownership of some resources and transfer ownership of
- * other resources to create or configure the context. The simulation may in
- * turn consume or borrow some resources from the context. This is a new
- * framework that will evolve in the contexts of
- * https://redmine.gromacs.org/issues/2375
- * https://redmine.gromacs.org/issues/2587
- * https://redmine.gromacs.org/issues/2605
- *
- * Ownership is returned as a smart pointer because ownership or reference may need to
- * be transferred through higher level code or maintained as a heap object,
- * and there is not yet a public interface, wrapper, or separate handle object.
- *
- * This call signature sets all parameters processed by the command-line client
- * code. Additional call signatures can allow implementation-specified default
- * values or newer initialization interfaces.
- *
- * \param simulationCommunicator Handle to communication data structure.
- *
- * \ingroup module_mdrun
- */
-SimulationContext createSimulationContext(CommrecHandle simulationCommunicator);
-//! \endcond
-
 }      // end namespace gmx
 
 #endif //GROMACS_SIMULATIONCONTEXT_H
index a33f433eb27c5365e1b1b3042f876cf907d8125e..964bfc48f7d38a7ff4556f2e61e692d3d86594c0 100644 (file)
@@ -70,9 +70,7 @@
 #include "gromacs/compat/optional.h"
 #include "gromacs/fileio/checkpoint.h"
 #include "gromacs/fileio/gmxfio.h"
-#include "gromacs/gmxlib/network.h"
 #include "gromacs/mdrunutility/multisim.h"
-#include "gromacs/mdtypes/commrec.h"
 #include "gromacs/mdtypes/mdrunoptions.h"
 #include "gromacs/utility/basedefinitions.h"
 #include "gromacs/utility/enumerationhelpers.h"
@@ -608,7 +606,8 @@ StartingBehaviorHandler::makeIndexOfNextPart(const AppendingBehavior appendingBe
 }   // namespace
 
 std::tuple<StartingBehavior, LogFilePtr>
-handleRestart(t_commrec              *cr,
+handleRestart(const bool              isSimulationMaster,
+              MPI_Comm                communicator,
               const gmx_multisim_t   *ms,
               const AppendingBehavior appendingBehavior,
               const int               nfile,
@@ -625,7 +624,7 @@ handleRestart(t_commrec              *cr,
     // Only the master rank of each simulation can do anything with
     // output files, so it is the only one that needs to consider
     // whether a restart might take place, and how to implement it.
-    if (MASTER(cr))
+    if (isSimulationMaster)
     {
         try
         {
@@ -667,18 +666,15 @@ handleRestart(t_commrec              *cr,
     //
     // TODO Evolve some re-usable infrastructure for this, because it
     // will be needed in many places while setting up simulations.
-    if (PAR(cr))
-    {
-        gmx_sumi(1, &numErrorsFound, cr);
-    }
-    if (isMultiSim(ms))
-    {
-        gmx_sumi_sim(1, &numErrorsFound, ms);
-        if (PAR(cr))
-        {
-            gmx_bcast(sizeof(numErrorsFound), &numErrorsFound, cr);
-        }
-    }
+#if GMX_LIB_MPI
+    int reducedNumErrorsFound;
+    MPI_Allreduce(&numErrorsFound, &reducedNumErrorsFound, 1, MPI_INT, MPI_SUM, communicator);
+    numErrorsFound = reducedNumErrorsFound;
+#else
+    // There is nothing to do with no MPI or thread-MPI, as there is
+    // only one rank at this point.
+    GMX_RELEASE_ASSERT(communicator == MPI_COMM_NULL, "Must have null communicator at this point");
+#endif
 
     // Throw in a globally coordinated way, if needed
     if (numErrorsFound > 0)
@@ -692,10 +688,14 @@ handleRestart(t_commrec              *cr,
             GMX_THROW(ParallelConsistencyError("Another MPI rank encountered an exception"));
         }
     }
-    if (PAR(cr))
-    {
-        gmx_bcast(sizeof(handler.startingBehavior), &handler.startingBehavior, cr);
-    }
+
+    // Ensure all ranks agree on the starting behavior, which is easy
+    // because all simulations in a multi-simulation already agreed on
+    // the starting behavior. There is nothing to do with no
+    // MPI or thread-MPI.
+#if GMX_LIB_MPI
+    MPI_Bcast(&handler.startingBehavior, 1, MPI_INT, 0, communicator);
+#endif
 
     return std::make_tuple(handler.startingBehavior, std::move(logFileGuard));
 }
index 3ff99de1b4c8891dc4439aac4c6d6f3dde3d45c4..5eeb4799e2700a5e461d5fa5e60243688541eae0 100644 (file)
@@ -60,9 +60,9 @@
 
 #include "gromacs/mdrunutility/logging.h"
 #include "gromacs/utility/basedefinitions.h"
+#include "gromacs/utility/gmxmpi.h"
 
 struct gmx_multisim_t;
-struct t_commrec;
 struct t_filenm;
 
 namespace gmx
@@ -105,7 +105,8 @@ enum class StartingBehavior : int
  * \throws GromacsException        On ranks upon which the error condition was
  *                                 not detected.
  *
- * \param[in]    cr                 Communication structure
+ * \param[in]    isSimulationMaster Whether this rank is the master rank of a simulation
+ * \param[in]    communicator       MPI communicator
  * \param[in]    ms                 Handles multi-simulations.
  * \param[in]    appendingBehavior  User choice for appending
  * \param[in]    nfile              Size of fnm struct
@@ -113,7 +114,8 @@ enum class StartingBehavior : int
  *
  * \return  Description of how mdrun is starting */
 std::tuple<StartingBehavior, LogFilePtr>
-handleRestart(t_commrec            *cr,
+handleRestart(bool                  isSimulationMaster,
+              MPI_Comm              communicator,
               const gmx_multisim_t *ms,
               AppendingBehavior     appendingBehavior,
               int                   nfile,
index 21f8f69c46ea80fd0d4a7a57af9225daf51ffaad..a671a1271e5c50f1c7283bd591ce026b50a67873 100644 (file)
 
 #include "config.h"
 
-#include "gromacs/gmxlib/network.h"
 #include "gromacs/mdtypes/commrec.h"
 #include "gromacs/utility/fatalerror.h"
 #include "gromacs/utility/futil.h"
+#include "gromacs/utility/gmxassert.h"
+#include "gromacs/utility/mpiinplacebuffers.h"
 #include "gromacs/utility/smalloc.h"
 
 gmx_multisim_t::gmx_multisim_t() = default;
@@ -102,7 +103,7 @@ gmx_multisim_t::gmx_multisim_t(MPI_Comm                         comm,
     MPI_Group mpi_group_world;
     MPI_Comm_group(comm, &mpi_group_world);
     MPI_Group_incl(mpi_group_world, nsim, rank.data(), &mpi_group_masters);
-    MPI_Comm_create(MPI_COMM_WORLD, mpi_group_masters,
+    MPI_Comm_create(comm, mpi_group_masters,
                     &mpi_comm_masters);
 
 #if !MPI_IN_PLACE_EXISTS
@@ -386,6 +387,40 @@ void check_multi_int64(FILE *log, const gmx_multisim_t *ms,
     sfree(ibuf);
 }
 
+bool findIsSimulationMasterRank(const gmx_multisim_t *ms,
+                                MPI_Comm              communicator)
+{
+    if (GMX_LIB_MPI)
+    {
+        // Ranks of multi-simulations know whether they are a master
+        // rank. Ranks of non-multi simulation do not know until a
+        // t_commrec is available.
+        if ((ms != nullptr) && (ms->nsim > 1))
+        {
+            return ms->mpi_comm_masters != MPI_COMM_NULL;
+        }
+        else
+        {
+#if GMX_LIB_MPI
+            int rank;
+            MPI_Comm_rank(communicator, &rank);
+            return (rank == 0);
+#endif
+        }
+    }
+    if (GMX_THREAD_MPI)
+    {
+        GMX_RELEASE_ASSERT(communicator == MPI_COMM_NULL ||
+                           communicator == MPI_COMM_WORLD, "Invalid communicator");
+        // Spawned threads have MPI_COMM_WORLD upon creation, so if
+        // the communicator is MPI_COMM_NULL this is not a spawned thread,
+        // ie is the master thread
+        return (communicator == MPI_COMM_NULL);
+    }
+    // No MPI means it must be the master (and only) rank.
+    return true;
+}
+
 bool isMasterSim(const gmx_multisim_t *ms)
 {
     return !isMultiSim(ms) || ms->sim == 0;
index 8c109fe8c43f7a11361c479fb10380add89bf171..e4be0dffdffccb77780837cdf0bcaa28d0709f83 100644 (file)
@@ -72,14 +72,6 @@ struct gmx_multisim_t
     //! Destructor
     ~gmx_multisim_t();
 
-    //! Are we the master simulation of a possible multi-simulation?
-    bool isMasterSim() const;
-
-    /*! \brief Are we the master rank (of the master simulation, for a multi-sim).
-     *
-     * This rank prints the remaining run time etc. */
-    bool isMasterSimMasterRank(bool isMaster) const;
-
     //! The number of simulations in the set of multi-simulations
     int                                 nsim              = 1;
     //! The index of the simulation that owns this object within the set
@@ -137,6 +129,12 @@ static bool inline isMultiSim(const gmx_multisim_t *ms)
     return ms != nullptr;
 }
 
+/*! \brief Return whether this rank is the master rank of a
+ * simulation, using \c ms (if it is valid) and otherwise \c
+ * communicator */
+bool findIsSimulationMasterRank(const gmx_multisim_t *ms,
+                                MPI_Comm              communicator);
+
 //! Are we the master simulation of a possible multi-simulation?
 bool isMasterSim(const gmx_multisim_t *ms);
 
index bc576c1ea144aaf6fac4f07d907d1018bd957597..24f922a83beac4afd8c4dcbd3aea5e7524cb1951 100644 (file)
@@ -1114,7 +1114,7 @@ int gmx_pme_error(int argc, char *argv[])
 
 #define NFILE asize(fnm)
 
-    CommrecHandle commrecHandle = init_commrec();
+    CommrecHandle commrecHandle = init_commrec(MPI_COMM_WORLD, nullptr);
     t_commrec    *cr            = commrecHandle.get();
     PCA_Flags  = PCA_NOEXIT_ON_ARGS;
 
index ce690eb9cdaa749e7d0b3f2c01ab662d8671c99e..9d62cd6371639c1a4de75268113985bbc04c6323 100644 (file)
  */
 #include "gmxpre.h"
 
+#include "config.h"
+
 #include <memory>
 
 #include "gromacs/commandline/pargs.h"
 #include "gromacs/domdec/options.h"
 #include "gromacs/fileio/gmxfio.h"
-#include "gromacs/gmxlib/network.h"
 #include "gromacs/mdrun/legacymdrunoptions.h"
 #include "gromacs/mdrun/runner.h"
 #include "gromacs/mdrun/simulationcontext.h"
 #include "gromacs/mdrunutility/handlerestart.h"
 #include "gromacs/mdrunutility/logging.h"
-#include "gromacs/mdtypes/commrec.h"
+#include "gromacs/mdrunutility/multisim.h"
 #include "gromacs/utility/arrayref.h"
 #include "gromacs/utility/smalloc.h"
 
@@ -211,24 +212,30 @@ int gmx_mdrun(int argc, char *argv[])
         return 0;
     }
 
-    StartingBehavior startingBehavior = StartingBehavior::NewSimulation;
-    LogFilePtr       logFileGuard     = nullptr;
+    ArrayRef<const std::string> multiSimDirectoryNames =
+        opt2fnsIfOptionSet("-multidir", ssize(options.filenames), options.filenames.data());
+
+    // Set up the communicator, where possible (see docs for
+    // SimulationContext).
+    MPI_Comm communicator = GMX_LIB_MPI ? MPI_COMM_WORLD : MPI_COMM_NULL;
+    // The SimulationContext is necessary with gmxapi so that
+    // resources owned by the client code can have suitable
+    // lifetime. The gmx wrapper binary uses the same infrastructure,
+    // but the lifetime is now trivially that of the invocation of the
+    // wrapper binary.
+    SimulationContext simulationContext(communicator, multiSimDirectoryNames);
+
+    StartingBehavior  startingBehavior = StartingBehavior::NewSimulation;
+    LogFilePtr        logFileGuard     = nullptr;
+    gmx_multisim_t   *ms               = simulationContext.multiSimulation_.get();
     std::tie(startingBehavior,
-             logFileGuard) = handleRestart(options.cr.get(),
-                                           options.ms.get(),
+             logFileGuard) = handleRestart(findIsSimulationMasterRank(ms, communicator),
+                                           communicator,
+                                           ms,
                                            options.mdrunOptions.appendingBehavior,
                                            ssize(options.filenames),
                                            options.filenames.data());
 
-    /* The SimulationContext is a resource owned by the client code.
-     * A more complete design should address handles to resources with appropriate
-     * lifetimes and invariants for the resources allocated to the client,
-     * to the current simulation and to scheduled tasks within the simulation.
-     *
-     * \todo Clarify Context lifetime-management requirements and reconcile with scoped ownership.
-     */
-    auto simulationContext = createSimulationContext(std::move(options.cr));
-
     /* The named components for the builder exposed here are descriptive of the
      * state of mdrun at implementation and are not intended to be prescriptive
      * of future design. (Note the ICommandLineOptions... framework used elsewhere.)
@@ -240,7 +247,8 @@ int gmx_mdrun(int argc, char *argv[])
      * We would prefer to rebuild resources only as necessary, but we defer such
      * details to future optimizations.
      */
-    auto builder = MdrunnerBuilder(std::move(mdModules));
+    auto builder = MdrunnerBuilder(std::move(mdModules),
+                                   compat::not_null<SimulationContext*>(&simulationContext));
     builder.addSimulationMethod(options.mdrunOptions, options.pforce, startingBehavior);
     builder.addDomainDecomposition(options.domdecOptions);
     // \todo pass by value
@@ -251,10 +259,6 @@ int gmx_mdrun(int argc, char *argv[])
     builder.addUpdateTaskAssignment(options.update_opt_choices[0]);
     builder.addNeighborList(options.nstlist_cmdline);
     builder.addReplicaExchange(options.replExParams);
-    // \todo take ownership of multisim resources (ms)
-    builder.addMultiSim(options.ms.get());
-    builder.addCommunicationRecord(simulationContext.communicationRecord_.get());
-    // \todo Provide parallelism resources through SimulationContext.
     // Need to establish run-time values from various inputs to provide a resource handle to Mdrunner
     builder.addHardwareOptions(options.hw_opt);
     // \todo File names are parameters that should be managed modularly through further factoring.