1 The modular simulator {#page_modularsimulator}
2 ==============================================
4 A new modular approach to the GROMACS simulator is described. The
5 simulator in GROMACS is the object which carries out a simulation. The
6 simulator object is created and owned by the runner object, which is
7 outside of the scope of this new approach, and will hence not be further
8 described. The simulator object provides access to some generally used
9 data, most of which is owned by the runner object.
11 ## Using the modular simulator
12 GROMACS will automatically use the modular simulator for the velocity
13 verlet integrator (`integrator = md-vv`), if the functionality chosen
14 in the other input parameters is implemented in the new framework.
15 Currently, this includes NVE, NVT, NPH, and NPT simulations,
16 with or without free energy perturbation, using thermodynamic
19 * `tcoupl`: `no`, `v-rescale`, or `berendsen`
20 * `pcoupl`: `no` or `parrinello-rahman`
22 To disable the modular simulator for cases defaulting to the new framework,
23 the environment variable `GMX_DISABLE_MODULAR_SIMULATOR=ON` can be set. To
24 use the new framework also for `integrator = md` (where the functionality is
25 implemented), the environment variable `GMX_USE_MODULAR_SIMULATOR=ON` can
26 be set to override legacy default.
28 ## Legacy implementation
30 In the legacy implementation, the simulator consisted of a number of
31 independent functions carrying out different type of simulations, such
32 as `do_md` (MD simulations), `do_cg` and `do_steep` (minimization),
33 `do_rerun` (force and energy evaluation of simulation trajectories),
34 `do_mimic` (MiMiC QM/MM simulations), `do_nm` (normal mode analysis),
35 and `do_tpi` (test-particle insertion).
37 The legacy approach has some obvious drawbacks:
38 * *Data management:* Each of the `do_*` functions defines local data,
39 including complex objects encapsulating some data and functionality,
40 but also data structures effectively used as "global variables" for
41 communication between different parts of the simulation. Neither the
42 ownership nor the access rights (except for `const` qualifiers) are
44 * *Dependencies:* Many function calls in the `do_*` functions are
45 dependent on each others, i.e. rely on being called in a specific
46 order, but these dependencies are not clearly defined.
47 * *Branches:* The flow of the `do_*` functions are hard to understand
48 due to branching. At setup time, and then at every step of the
49 simulation run, a number of booleans are set (e.g. `bNS` (do neighbor
50 searching), `bCalcEner` (calculate energies), `do_ene` (write
51 energies), `bEner` (energy calculation needed), etc). These booleans
52 enable or disable branches of the code (for the current step or the
53 entire run), mostly encoded as `if(...)` statements in the main `do_*`
54 loop, but also in functions called from there.
55 * *Task scheduling:* Poorly defined dependencies and per-step branching
56 make task scheduling (e.g. parallel execution of independent tasks)
58 * *Error-prone for developers:* Poorly defined dependencies and unclear
59 code flow make changing the simulator functions very error-prone,
60 rendering the implementation of new methods tedious.
62 ## The modular simulator approach
64 The main design goals of the new, fully modular simulator approach
66 * *Extensibility:* We want to ease maintenance and the implementation
67 of new integrator schemes.
68 * *Monte Carlo:* We want to add MC capability, which can be mixed with
69 MD to create hybrid MC/MD schemes.
70 * *Data locality & interfaces:* We aim at localizing data in objects,
71 and offer interfaces if access from other objects is needed.
72 * *Multi-stepping:* We aim at a design which intrinsically supports
73 multi-step integrators, e.g. having force calls at different
74 frequencies, or avoid having branches including rare events
75 (trajectory writing, neighbor search, ...) in the computation loops.
76 * *Task parallelism:* Although not first priority, we want to have a
77 design which can be extended to allow for task parallelism.
79 The general design approach is that of a **task scheduler**. *Tasks*
80 are argument-less functions which perform a part of the computation.
81 Periodically during the simulation, the scheduler builds a
82 *queue of tasks*, i.e. a list of tasks which is then run through in
83 order. Over time, with data dependencies clearly defined, this
84 approach can be modified to have independent tasks run in parallel.
86 ### Simulator elements
88 The task scheduler holds a list of *simulator elements*, defined by
89 the `ISimulatorElement` interface. These elements have a
90 `scheduleTask(Step, Time)` function, which gets called by the task
91 scheduler. This allows the simulator element to register one (or more)
92 function pointers to be run at that specific `(Step, Time)`. From the
93 point of view of the element, it is important to note that the
94 computation will not be carried out immediately, but that it will be
95 called later during the actual (partial) simulation run. From the
96 point of view of the builder of the task scheduler, it is important to
97 note that the order of the elements determines the order in which
98 computation is performed.
100 class ISimulatorElement
103 /*! \\brief Query whether element wants to run at step / time
105 * Element can register one or more functions to be run at that step through
106 * the registration pointer.
108 virtual void scheduleTask(Step, Time, const RegisterRunFunction&) = 0;
109 //! Method guaranteed to be called after construction, before simulator run
110 virtual void elementSetup() = 0;
111 //! Method guaranteed to be called after simulator run, before deconstruction
112 virtual void elementTeardown() = 0;
113 //! Standard virtual destructor
114 virtual ~ISimulatorElement() = default;
118 The task scheduler periodically loops over
119 its list of elements, builds a queue of function pointers to run, and
120 returns this list of tasks. As an example, a possible application
121 would be to build a new queue after each domain-decomposition (DD) /
122 neighbor-searching (NS) step, which might occur every 100 steps. The
123 scheduler would loop repeatedly over all its elements, with elements
124 like the trajectory-writing element registering for only one or no
125 step at all, the energy-calculation element registering for every
126 tenth step, and the force, position / velocity propagation, and
127 constraining algorithms registering for every step. The result would
128 be a (long) queue of function pointers including all computations
129 needed until the next DD / NS step, which can be run without any
134 Some elements might require computations by other elements. If for
135 example, the trajectory writing is an element independent from the
136 energy-calculation element, it needs to signal to the energy element
137 that it is about to write a trajectory, and that the energy element
138 should be ready for that (i.e. perform an energy calculation in the
139 upcoming step). This requirement, which replaces the boolean branching
140 in the current implementation, is fulfilled by a Signaller - Client
141 model. Classes implementing the `ISignaller` interface get called
142 *before* every loop of the element list, and can inform registered
143 clients about things happening during that step. The trajectory
144 element, for example, can tell the energy element that it will write
145 to trajectory at the end of this step. The energy element can then
146 register an energy calculation during that step, being ready to write
147 to trajectory when requested.
152 //! Function run before every step of scheduling
153 virtual void signal(Step, Time) = 0;
154 //! Method guaranteed to be called after construction, before simulator run
155 virtual void setup() = 0;
158 ### The modular simulator
160 The approach is most easily displayed using some simplified (pseudo) code.
162 The simulator itself is responsible to **store the elements in the
163 right order** (in `addIntegrationElements`) This includes the different
164 order of elements in different algorithms (e.g. leap-frog vs. velocity
165 verlet), but also logical dependencies (energy output after compute
166 globals). Once the algorithm has been built, the simulator simply
167 executes one task after the next, until the end of the simulation is
170 class ModularSimulator : public ISimulator
173 //! Run the simulator
177 void ModularSimulator::run()
180 ModularSimulatorAlgorithmBuilder algorithmBuilder();
181 addIntegrationElements(&algorithmBuilder);
182 auto algorithm = algorithmBuilder.build();
184 while (const auto* task = algorithm.getNextTask())
191 The following snippet illustrates building a leap-frog integration
192 algorithm. The algorithm builder allows for a concise description of
193 the simulator algorithm.
195 void ModularSimulator::addIntegrationElements(ModularSimulatorAlgorithmBuilder* builder)
197 if (legacySimulatorData_->inputrec->eI == eiMD)
199 // The leap frog integration algorithm
200 builder->add<ForceElement>();
201 // We have a full state here (positions(t), velocities(t-dt/2), forces(t)
202 builder->add<StatePropagatorData::Element>();
203 if (legacySimulatorData_->inputrec->etc == TemperatureCoupling::VRescale)
205 builder->add<VRescaleThermostat>(-1, VRescaleThermostatUseFullStepKE::No);
207 builder->add<Propagator<IntegrationStep::LeapFrog>>(legacySimulatorData_->inputrec->delta_t,
208 RegisterWithThermostat::True,
209 RegisterWithBarostat::True);
210 if (legacySimulatorData_->constr)
212 builder->add<ConstraintsElement<ConstraintVariable::Positions>>();
214 builder->add<ComputeGlobalsElement<ComputeGlobalsAlgorithm::LeapFrog>>();
215 // We have the energies at time t here
216 builder->add<EnergyData::Element>();
217 if (legacySimulatorData_->inputrec->epc == PressureCoupling::ParrinelloRahman)
219 builder->add<ParrinelloRahmanBarostat>(-1);
224 ### The simulator algorithm
226 The simulator algorithm is responsible to **decide if elements need to
227 run at a specific time step**. The elements get called in order, and
228 decide whether they need to run at a specific step. This can be
229 pre-computed for multiple steps. In the current implementation, the
230 tasks are pre-computed for the entire life-time of the neighbor
233 The simulator algorithm offers functionality to get the next task
234 from the queue. It owns all elements involved in the simulation
235 and is hence controlling their lifetime. This ensures that pointers and
236 callbacks exchanged between elements remain valid throughout the duration
237 of the simulation run. It also maintains the list of tasks,
238 and updates it when needed.
240 class ModularSimulatorAlgorithm
243 //! Get next task in queue
244 [[nodiscard]] const SimulatorRunFunction* getNextTask();
246 //! List of signalers
247 std::vector<std::unique_ptr<ISignaller>> signallerList_;
249 std::vector<std::unique_ptr<ISimulatorElement>> elementsList_;
252 std::vector<SimulatorRunFunction> taskQueue_;
253 //! The task iterator
254 std::vector<SimulatorRunFunction>::const_iterator taskIterator_;
256 //! Update task queue
257 void updateTaskQueue();
260 The `getNextTask()` function is returning the next task in the task
261 queue. It rebuilds the task list when needed.
263 const SimulatorRunFunction* ModularSimulatorAlgorithm::getNextTask()
265 if (!taskQueue_.empty())
269 if (taskIterator_ == taskQueue_.end())
276 taskIterator_ = taskQueue_.begin();
278 return &*taskIterator_;
281 Updating the task queue involves calling all signallers and
282 elements for every step of the scheduling period. This refills
283 the task queue. It is important to keep in mind that the *scheduling step* is not
284 necessarily identical to the *current step* of the simulation. Most of
285 the time, the scheduling step is ahead, as we are pre-scheduling steps.
287 void ModularSimulatorAlgorithm::updateTaskQueue()
289 for (Step schedulingStep = currentStep;
290 schedulingStep < currentStep + schedulingPeriod;
293 Time time = getTime(schedulingStep);
294 // Have signallers signal any special treatment of scheduling step
295 for (const auto& signaller : signallerList)
297 signaller.signal(schedulingStep, time);
299 // Query all elements whether they need to run at scheduling step
300 for (const auto& element : signallerList)
302 element.schedule(schedulingStep, time, registerRunFunction_);
307 ### Sequence diagrams
310 In the loop preparation, the signallers and elements are created and
311 stored in the right order. The signallers and elements can then
312 perform any setup operations needed.
317 ModularSimulatorBuilder [label="ModularSimulatorAlgorithmBuilder"],
318 ModularSimulator [label="ModularSimulatorAlgorithm"],
319 Signallers [label="ModularSimulatorAlgorithm::\nSignallers"],
320 Elements [label="ModularSimulatorAlgorithm::\nElements"],
321 TaskQueue [label="ModularSimulatorAlgorithm::\nTaskQueue"];
323 --- [ label = "constructElementsAndSignallers()" ];
324 ModularSimulatorBuilder => Signallers [ label = "Create signallers\nand order them" ];
325 ModularSimulatorBuilder => Elements [ label = "Create elements\nand order them" ];
326 --- [ label = "constructElementsAndSignallers()" ];
330 --- [ label = "setupAllElements()" ];
331 ModularSimulator => Signallers [ label = "Call setup()" ];
332 Signallers box Signallers [ label = "for signaler in Signallers\n signaller->setup()" ];
334 ModularSimulator => Elements [ label = "Call setup()" ];
335 Elements box Elements [ label = "for element in Elements\n element->setup()" ];
336 --- [ label = "setupAllElements()" ];
340 The main loop consists of two parts which are alternately run until the
341 simulation stop criterion is met. The first part is the population of
342 the task queue, which determines all tasks that will have to run to
343 simulate the system for a given time period. In the current implementation,
344 the scheduling period is set equal to the lifetime of the neighborlist.
345 Once the tasks have been predetermined, the simulator runs them in order.
346 This is the actual simulation computation, which can now run without any
352 ModularSimulator [label="ModularSimulatorAlgorithm"],
353 Signallers [label="ModularSimulatorAlgorithm::\nSignallers"],
354 Elements [label="ModularSimulatorAlgorithm::\nElements"],
355 TaskQueue [label="ModularSimulatorAlgorithm::\nTaskQueue"];
357 ModularSimulator box TaskQueue [ label = "loop: while(not lastStep)" ];
358 ModularSimulator note TaskQueue [ label = "The task queue is empty. The simulation state is at step N.", textbgcolor="yellow" ];
361 ModularSimulator box ModularSimulator [ label = "populateTaskQueue()" ];
362 ModularSimulator =>> TaskQueue [ label = "Fill task queue with tasks until next neighbor-searching step" ];
365 ModularSimulator note TaskQueue [ label = "The task queue now holds all tasks needed to move the simulation from step N to step N + nstlist. The simulation for these steps has not been performed yet, however. The simulation state is hence still at step N.", textbgcolor="yellow" ];
369 ModularSimulator => TaskQueue [ label = "Run all tasks in TaskQueue" ];
370 TaskQueue box TaskQueue [label = "for task in TaskQueue\n run task" ];
371 TaskQueue note TaskQueue [ label = "All simulation computations are happening in this loop!", textbgcolor="yellow" ];
374 ModularSimulator note TaskQueue [ label = "The task queue is now empty. The simulation state is at step N + nstlist.", textbgcolor="yellow" ];
375 ModularSimulator box TaskQueue [ label = "end loop: while(not lastStep)" ];
380 A part of the main loop, the task scheduling in `populateTaskQueue()`
381 allows the elements to push tasks to the task queue. For every scheduling
382 step, the signallers are run first to give the elements information about
383 the upcoming scheduling step. The scheduling routine elements are then
384 called in order, allowing the elements to register their respective tasks.
389 ModularSimulator [label="ModularSimulatorAlgorithm"],
390 Signallers [label="ModularSimulatorAlgorithm::\nSignallers"],
391 Elements [label="ModularSimulatorAlgorithm::\nElements"],
392 TaskQueue [label="ModularSimulatorAlgorithm::\nTaskQueue"];
394 --- [ label = "populateTaskQueue()" ];
395 ModularSimulator box ModularSimulator [ label = "doDomainDecomposition()\ndoPmeLoadBalancing()" ];
396 ModularSimulator =>> Elements [ label = "Update state and topology" ];
400 ModularSimulator note ModularSimulator [ label = "schedulingStep == N\nsimulationStep == N", textbgcolor="yellow" ];
401 ModularSimulator box TaskQueue [ label = "loop: while(not nextNeighborSearchingStep)" ];
402 ModularSimulator => Signallers [ label = "Run signallers for schedulingStep" ];
403 Signallers box Signallers [label = "for signaller in Signallers\n signaller->signal(scheduleStep)" ];
404 Signallers =>> Elements [ label = "notify" ];
405 Signallers note Elements [ label = "The elements now know if schedulingStep has anything special happening, e.g. neighbor searching, log writing, trajectory writing, ...", textbgcolor="yellow" ];
409 ModularSimulator => Elements [ label = "Schedule run functions for schedulingStep" ];
410 Elements box Elements [label = "for element in Elements\n element->scheduleTask(scheduleStep)" ];
411 Elements =>> TaskQueue [ label = "Push task" ];
412 Elements note TaskQueue [ label = "The elements have now registered everything they will need to do for schedulingStep.", textbgcolor="yellow" ];
413 ModularSimulator note ModularSimulator [ label = "schedulingStep++", textbgcolor="yellow" ];
415 ModularSimulator box TaskQueue [ label = "end loop: while(not nextNeighborSearchingStep)" ];
416 --- [ label = "populateTaskQueue()" ];
417 ModularSimulator note ModularSimulator [ label = "schedulingStep == N + nstlist\nsimulationStep == N", textbgcolor="yellow" ];
421 ## Acceptance tests and further plans
423 Acceptance tests which need to be
424 fulfilled to make the modular simulator the default code path:
425 * End-to-end tests pass on both `do_md` and the new loop in
426 Gitlab pre- and post-submit pipelines
427 * Physical validation cases pass on the new loop
428 * Performance on different sized benchmark cases, x86 CPU-only
429 and NVIDIA GPU are at most 1% slower -
430 https://github.com/ptmerz/gmxbenchmark has been developed to
433 After the MD bare minimum, we will want to add support for
435 * Full support of GPU (current implementation does not support
438 Using the new modular simulator framework, we will then explore
439 adding new functionality to GROMACS, including
440 * Monte Carlo barostat
441 * hybrid MC/MD schemes
442 * multiple-time-stepping integration
444 We will also explore optimization opportunities, including
445 * re-use of the same queue if conditions created by user input are
446 sufficiently favorable (by design or when observed)
447 * simultaneous execution of independent tasks
449 We will probably not prioritize support for (and might consider
450 deprecating from do_md in a future GROMACS version)
451 * Simulated annealing
453 * Simulated tempering
458 * Fancy mdp options for FEP output
461 * Constant acceleration groups
462 * Ensemble-averaged restraints
463 * Time-averaged restraints
464 * Freeze, deform, cos-acceleration
466 ## Signaller and element details
468 The current implementation of the modular simulator consists of
469 the following signallers and elements:
473 All signallers have a list of pointers to clients, objects that
474 implement a respective interface and get notified of events the
475 signaller is communicating.
477 * `NeighborSearchSignaller`: Informs its clients whether the
478 current step is a neighbor-searching step.
479 * `LastStepSignaller`: Informs its clients when the current step
480 is the last step of the simulation.
481 * `LoggingSignaller`: Informs its clients whether output to the
482 log file is written in the current step.
483 * `EnergySignaller`: Informs its clients about energy related
484 special steps, namely energy calculation steps, virial
485 calculation steps, and free energy calculation steps.
486 * `TrajectorySignaller`: Informs its clients if writing to
487 trajectory (state [x/v/f] and/or energy) is planned for the
490 ### Simulator Elements
492 #### `TrajectoryElement`
493 The `TrajectoryElement` is calling its trajectory clients, passing them a valid
494 output pointer and letting them write to trajectory. Unlike the
495 legacy implementation, the trajectory element itself knows nothing
496 about the data that is written to file - it is only responsible
497 to inform clients about trajectory steps, and providing a valid
498 file pointer to the objects that need to write to trajectory.
500 #### `ComputeGlobalsElement`
501 The `ComputeGlobalsElement` encapsulates the legacy calls to
502 `compute_globals`. While a new approach to the global reduction
503 operations has been discussed, it is currently not part of this
504 effort. This element therefore aims at offering an interface
505 to the legacy implementation which is compatible with the new
508 The element currently comes in 3 (templated) flavors: the leap-frog case,
509 the first call during a velocity-verlet integrator, and the second call
510 during a velocity-verlet integrator. It is the responsibility of the
511 simulator builder to place them at the right place of the
512 integration algorithm.
514 #### `ForceElement` and `ShellFCElement`
515 The `ForceElement` and the `ShellFCElement` encapsulate the legacy
516 calls to `do_force` and `do_shellfc`, respectively. It is the
517 responsibility of the simulator builder to place them at the right
518 place of the integration algorithm. Moving forward, a version of these
519 elements which would allow calling of do_force with subsets of the topology
520 would be desirable to pave the way towards multiple time step integrators
521 within modular simulator, allowing to integrate slower degrees of freedom
522 at a different frequency than faster degrees of freedom.
524 #### `ConstraintElement`
525 The constraint element is implemented for the two cases of constraining
526 both positions and velocities, and only velocities. It does not change the constraint
527 implementation itself, but replaces the legacy `constrain_coordinates`
528 and `constrain_velocities` calls from update.h by elements implementing
529 the ISimulatorElement interface and using the new data management.
532 The propagator element can, through templating, cover the different
533 propagation types used in the currently implemented MD schemes. The
534 combination of templating, static functions, and having only the
535 inner-most operations in the static functions allows to have performance
536 comparable to fused update elements while keeping easily re-orderable
539 Currently, the (templated) implementation covers four cases:
540 * *PositionsOnly:* Moves the position vector by the given time step
541 * *VelocitiesOnly:* Moves the velocity vector by the given time step
542 * *LeapFrog:* A manual fusion of the previous two propagators
543 * *VelocityVerletPositionsAndVelocities:* A manual fusion of VelocitiesOnly
544 and PositionsOnly, where VelocitiesOnly is only propagated by half the
545 time step of PositionsOnly.
547 The propagators also allow to implement temperature and pressure coupling
548 schemes by offering (templated) scaling of the velocities.
550 #### `CompositeSimulatorElement`
551 The composite simulator element takes a list of elements and implements
552 the ISimulatorElement interface, making a group of elements effectively
553 behave as one. This simplifies building algorithms.
555 #### `VRescaleThermostat`
556 The `VRescaleThermostat` implements the v-rescale thermostat. It takes a
557 callback to the propagator and updates the velocity scaling factor
558 according to the v-rescale thermostat formalism.
560 #### `ParrinelloRahmanBarostat`
561 The `ParrinelloRahmanBarostat` implements the Parrinello-Rahman barostat.
562 It integrates the Parrinello-Rahman box velocity equations, takes a
563 callback to the propagator to update the velocity scaling factor, and
564 scales the box and the positions of the system.
566 #### `StatePropagatorData::Element`
567 The `StatePropagatorData::Element` takes part in the simulator run, as it might
568 have to save a valid state at the right moment during the
569 integration. Placing the StatePropagatorData correctly is for now the
570 duty of the simulator builder - this might be automated later
571 if we have enough meta-data of the variables (i.e., if
572 `StatePropagatorData` knows at which time the variables currently are,
573 and can decide when a valid state (full-time step of all
574 variables) is reached. The `StatePropagatorData::Element` is also a client of
575 both the trajectory signaller and writer - it will save a
576 state for later writeout during the simulator step if it
577 knows that trajectory writing will occur later in the step,
578 and it knows how to write to file given a file pointer by
579 the `TrajectoryElement`.
581 #### `EnergyData::Element`
582 The `EnergyData::Element` takes part in the simulator run,
583 either adding data (at energy calculation steps), or
584 recording a non-calculation step (all other steps). It is the
585 responsibility of the simulator builder to ensure that the
586 `EnergyData::Element` is called at a point of the simulator run
587 at which it has access to a valid energy state.
589 It subscribes to the trajectory signaller, the energy signaller,
590 and the logging signaller to know when an energy calculation is
591 needed and when a non-recording step is enough. The EnergyData
592 element is also a subscriber to the trajectory writer element,
593 as it is responsible to write energy data to trajectory.
595 #### `FreeEnergyPerturbationData::Element`
596 The `FreeEnergyPerturbationData::Element` is a member class of
597 `FreeEnergyPerturbationData` that updates the lambda
598 values during the simulation run if lambda is non-static. It
599 implements the checkpointing client interface to save the current
600 state of `FreeEnergyPerturbationData` for restart.
604 ### `StatePropagatorData`
605 The `StatePropagatorData` contains a little more than the pure
606 statistical-physical micro state, namely the positions,
607 velocities, forces, and box matrix, as well as a backup of
608 the positions and box of the last time step. While it takes
609 part in the simulator loop to be able to backup positions /
610 boxes and save the current state if needed, it's main purpose
611 is to offer access to its data via getter methods. All elements
612 reading or writing to this data need a pointer to the
613 `StatePropagatorData` and need to request their data explicitly. This
614 will later simplify the understanding of data dependencies
617 Note that the `StatePropagatorData` can be converted to and from the
618 legacy `t_state` object. This is useful when dealing with
619 functionality which has not yet been adapted to use the new
620 data approach. Of the elements currently implemented, only
621 domain decomposition, PME load balancing, and the initial
622 constraining are using this.
625 The EnergyData owns the EnergyObject, and is hence responsible
626 for saving energy data and writing it to trajectory.
627 The EnergyData offers an interface to add virial contributions,
628 but also allows access to the raw pointers to tensor data, the
629 dipole vector, and the legacy energy data structures.
631 ### `FreeEnergyPerturbationData`
632 The `FreeEnergyPerturbationData` holds the lambda vector and the
633 current FEP state, offering access to its values via getter
636 ## Simulator algorithm builder
637 Elements that define the integration algorithm (i.e. which are
638 added using the templated `ModularSimulatorAlgorithmBuilder::add`
639 method) need to implement a `getElementPointerImpl` factory function.
640 This gives them access to the data structures and some other
641 infrastructure, but also allows elements to accept additional
642 arguments (e.g frequency, offset, ...).
644 template<typename Element, typename... Args>
645 void ModularSimulatorAlgorithmBuilder::add(Args&&... args)
647 // Get element from factory method
648 auto* element = static_cast<Element*>(getElementPointer<Element>(
649 legacySimulatorData_, &elementAdditionHelper_, statePropagatorData_.get(),
650 energyData_.get(), freeEnergyPerturbationData_.get(), &globalCommunicationHelper_,
651 std::forward<Args>(args)...));
653 // Make sure returned element pointer is owned by *this
654 // Ensuring this makes sure we can control the life time
655 if (!elementExists(element))
657 throw ElementNotFoundError("Tried to append non-existing element to call list.");
660 // Register element with infrastructure
663 Note that `getElementPointer<Element>` will call `Element::getElementPointerImpl`,
664 which needs to be implemented by the different elements.
667 ### `DomDecHelper` and `PmeLoadBalanceHelper`
668 These infrastructure elements are responsible for domain decomposition and
669 PME load balancing, respectively. They encapsulate function calls which are
670 important for performance, but outside the scope of this effort. They rely
671 on legacy data structures for the state (both) and the topology (domdec).
673 The elements do not implement the ISimulatorElement interface, as
674 the Simulator is calling them explicitly between task queue population
675 steps. This allows elements to receive the new topology / state before
676 deciding what functionality they need to run.
679 The `CheckpointHelper` is responsible to write checkpoints, and to offer
680 its clients access to the data read from checkpoint.
682 Writing checkpoints is done just before neighbor-searching (NS) steps,
683 or before the last step. Checkpointing occurs periodically (by default,
684 every 15 minutes), and needs two NS steps to take effect - on the first
685 NS step, the checkpoint helper on master rank signals to all other ranks
686 that checkpointing is about to occur. At the next NS step, the checkpoint
687 is written. On the last step, checkpointing happens immediately before the
688 step (no signalling). To be able to react to last step being signalled,
689 the CheckpointHelper also implements the `ISimulatorElement` interface,
690 but only registers a function if the last step has been called.
692 Checkpointing happens at the top of a simulation step, which gives a
693 straightforward re-entry point at the top of the simulator loop.
695 #### Implementation details
696 ##### Other (older) approaches
697 **Legacy checkpointing approach:** All data to be checkpointed needs to be
698 stored in one of the following data structures:
699 * `t_state`, which also holds pointers to
700 - `history_t` (history for restraints)
701 - `df_history_t` (history for free energy)
704 * `ObservableHistory`, consisting of
709 * Checkpoint further saves details about the output files being used
711 These data structures are then serialized by a function having knowledge of
712 their implementation details. One possibility to add data to the checkpoint
713 is to expand one of the objects that is currently being checkpointed, and
714 edit the respective `do_cpt_XXX` function in `checkpoint.cpp` which interacts
715 with the XDR library. The alternative would be to write an entirely new data
716 structure, changing the function signature of all checkpoint-related functions,
717 and write a corresponding low-level routine interacting with the XDR library.
719 **The MdModule approach:** To allow for modules to write checkpoints, the legacy
720 checkpoint was extended by a KVTree. When writing to checkpoint, this tree gets
721 filled (via callbacks) by the single modules, and then serialized. When reading,
722 the KVTree gets deserialized, and then distributed to the modules which can read
723 back the data previously stored.
725 ##### Modular simulator design
727 The MdModule checks off almost all requirements to a modularized checkpointing format.
728 The proposed design is therefore an evolved form of this approach. Notably, two
730 * Hide the implementation details of the data structure holding the data (currently,
731 a KV-Tree) from the clients. This allows to change the implementation details of
732 reading / writing checkpoints without touching client code.
733 * Offer a unified way to read and write to data, allowing clients to write one
734 (templated) function to read to and write from checkpoint. This allows to
735 eliminate code duplication and the danger of having read and write functions
738 The modular simulator checkpointing does not currently change the way that the
739 legacy simulator is checkpointing. Some data structures involved in the legacy
740 checkpoint did, however, get an implementation of the new approach. This is
741 needed for ModularSimulator checkpointing, but also gives a glimpse of how
742 implementing this for legacy data structures would look like.
744 The most important design part is the `CheckpointData` class. It exposes methods
745 to read and write scalar values, ArrayRefs, and tensors. It also allows to create
746 a "sub-object" of the same type `CheckpointData` which allows to have more complex
747 members implement their own checkpointing routines (without having to be aware that
748 they are a member). All methods are templated on the chosen operation,
749 `CheckpointDataOperation::Read` or `CheckpointDataOperation::Write`, allowing clients
750 to use the same code to read and write to checkpoint. Type traits and constness are
751 used to catch as many errors as possible at compile time. `CheckpointData` uses a
752 KV-tree to store the data internally. This is however never exposed to the client.
753 Having this abstraction layer gives freedom to change the internal implementation
756 All `CheckpointData` objects are owned by a `ReadCheckpointDataHolder` or
757 `WriteCheckpointDataHolder`. These holder classes own the internal KV-tree, and offer
758 `deserialize(ISerializer*)` and `serialize(ISerializer*)` functions, respectively,
759 which allow to read from / write to file. This separation clearly defines ownership
760 and separates the interface aimed at file IO from the interface aimed at objects
761 reading / writing checkpoints.
763 Checkpointing for modular simulator is tied in the general checkpoint facility by
764 passing a `ReadCheckpointDataHolder` or `WriteCheckpointDataHolder` object to the
765 legacy checkpoint read and write operations.
767 ##### Notes about the modular simulator checkpointing design
769 **Distinction of data between clients:** The design requires that separate
770 clients have independent sub-`CheckpointData` objects identified by a unique key.
771 This key is the only thing that needs to be unique between clients, i.e. clients are
772 free to use any key _within_ their sub-`CheckpointData` without danger to overwrite
773 data used by other clients.
775 **Versioning:** The design relies on clients keeping their own versioning system
776 within their sub-`CheckpointData` object. As the data stored by clients is opaque
777 to the top level checkpointing facility, it has no way to know when the internals
778 change. Only fundamental changes to the checkpointing architecture can still be
779 tracked by a top-level checkpoint version.
781 **Key existence:** The currently uploaded design does not allow to check whether
782 a key is present in `CheckpointData`. This could be introduced if needed - however,
783 if clients write self-consistent read and write code, this should never be needed.
784 Checking for key existence seems rather to be a lazy way to circumvent versioning,
785 and is therefore discouraged.
787 **Callback method:** The modular simulator and MdModules don't use the exact same
788 way of communicating with clients. The two methods could be unified if needed.
789 The only _fundamental_ difference is that modular simulator clients need to identify
790 with a unique key to receive their dedicated sub-data, while MdModules all read from
791 and write to the same KV-tree. MdModules could be adapted to that by either requiring
792 a unique key from the modules, or by using the same `CheckpointData` for all modules
793 and using a single unique key (something like "MdModules") to register that object
794 with the global checkpoint.
796 **Performance:** One of the major differences between the new approach and the legacy
797 checkpointing is that data gets _copied_ into `CheckpointData`, while the legacy
798 approach only took a pointer to the data and serialized it. This slightly reduces
799 performance. Some thoughts on that:
800 * By default, checkpointing happens at the start of the simulation (only if reading
801 from checkpoint), every 15 minutes during simulation runs, and at the end of the
802 simulation run. This makes it a low priority target for optimization. Consequently,
803 not much thoughts have been put in the optimization, but there's certainly some way
804 to improve things here and there if we consider it necessary.
805 * The copying will only have measurable effect when large data gets checkpointed -
806 likely only for saving the positions / velocities of the entire state, so that
807 should be the first target for optimization if needed.
808 * Copying data could have advantages moving forward - we could continue the
809 simulation as soon as the data is written to the `CheckpointData` object, and don't
810 necessarily need to wait for writing to the physical medium to happen. It also
811 simplifies moving the point at which checkpointing is performed within the
812 integrator. One could envision clients storing their data any time during the
813 integration step, and serializing the resulting `CheckpointData` after the step.
814 This avoids the need to find a single point within the loop at which all clients
815 need to be in a state suitable for checkpointing.
816 * If, however, we wanted to use the same approach for other, more frequent
817 (and hence more perfomance critical) operations such as saving/restoring states
818 for MC type algorithms or swapping of states between running simulations in
819 multi-sim type settings, performance would become more of an issue.
821 **ISerializer vs KV-tree:** The new approach uses a KV tree internally. The
822 KV tree is well suited to represent the design philosophy of the approach:
823 Checkpointing offers a functionality which allows clients to write/read any data
824 they want. This data remains opaque to the checkpointing element. Clients can
825 read or write in any order, and in the future, maybe even simultaneously. Data
826 written by any element should be protected from access from other elements to
827 avoid bugs. The downside of the KV tree is that all data gets copied before
828 getting written to file (see above).
830 Replacing a KV tree by a single ISerializer object which gets passed to clients
831 would require elements to read and write sequentially in a prescribed order. With
832 the help of InMemorySerializer, a KV-Tree could likely be emulated (with sub-objects
833 that serialize to memory, and then a parent object that serializes this memory to
834 file), but that doesn't present a clear advantage anymore.
837 The topology object owns the local topology and holds a constant reference
838 to the global topology owned by the ISimulator.
840 The local topology is only infrequently changed if domain decomposition is
841 on, and never otherwise. The topology holder therefore offers elements to register
842 as ITopologyHolderClients. If they do so, they get a handle to the updated local
843 topology whenever it is changed, and can rely that their handle is valid
844 until the next update. The domain decomposition element is defined as friend
845 class to be able to update the local topology when needed.