docs/doxygen/lib/modularsimulator.md

   1 The modular simulator {#page_modularsimulator}
   2 ==============================================
   3
   4 A new modular approach to the GROMACS simulator is described. The
   5 simulator in GROMACS is the object which carries out a simulation. The
   6 simulator object is created and owned by the runner object, which is
   7 outside of the scope of this new approach, and will hence not be further
   8 described. The simulator object provides access to some generally used
   9 data, most of which is owned by the runner object.
  10
  11 ## Using the modular simulator
  12 GROMACS will automatically use the modular simulator for the velocity
  13 verlet integrator (`integrator = md-vv`), if the functionality chosen
  14 in the other input parameters is implemented in the new framework.
  15 Currently, this includes NVE, NVT, NPH, and NPT simulations,
  16 with or without free energy perturbation, using thermodynamic
  17 boundary conditions
  18
  19 * `tcoupl`: `no`, `v-rescale`, or `berendsen`
  20 * `pcoupl`: `no` or `parrinello-rahman`
  21
  22 To disable the modular simulator for cases defaulting to the new framework,
  23 the environment variable `GMX_DISABLE_MODULAR_SIMULATOR=ON` can be set. To
  24 use the new framework also for `integrator = md` (where the functionality is
  25 implemented), the environment variable `GMX_USE_MODULAR_SIMULATOR=ON` can
  26 be set to override legacy default.
  27
  28 ## Legacy implementation
  29
  30 In the legacy implementation, the simulator consisted of a number of
  31 independent functions carrying out different type of simulations, such
  32 as `do_md` (MD simulations), `do_cg` and `do_steep` (minimization),
  33 `do_rerun` (force and energy evaluation of simulation trajectories),
  34 `do_mimic` (MiMiC QM/MM simulations), `do_nm` (normal mode analysis),
  35 and `do_tpi` (test-particle insertion).
  36
  37 The legacy approach has some obvious drawbacks:
  38 * *Data management:* Each of the `do_*` functions defines local data,
  39   including complex objects encapsulating some data and functionality,
  40   but also data structures effectively used as "global variables" for
  41   communication between different parts of the simulation. Neither the
  42   ownership nor the access rights (except for `const` qualifiers) are
  43   clearly defined.
  44 * *Dependencies:* Many function calls in the `do_*` functions are
  45   dependent on each others, i.e. rely on being called in a specific
  46   order, but these dependencies are not clearly defined.
  47 * *Branches:* The flow of the `do_*` functions are hard to understand
  48   due to branching. At setup time, and then at every step of the
  49   simulation run, a number of booleans are set (e.g. `bNS` (do neighbor
  50   searching), `bCalcEner` (calculate energies), `do_ene` (write
  51   energies), `bEner` (energy calculation needed), etc). These booleans
  52   enable or disable branches of the code (for the current step or the
  53   entire run), mostly encoded as `if(...)` statements in the main `do_*`
  54   loop, but also in functions called from there.
  55 * *Task scheduling:* Poorly defined dependencies and per-step branching
  56   make task scheduling (e.g. parallel execution of independent tasks)
  57   very difficult.
  58 * *Error-prone for developers:* Poorly defined dependencies and unclear
  59   code flow make changing the simulator functions very error-prone,
  60   rendering the implementation of new methods tedious.
  61
  62 ## The modular simulator approach
  63
  64 The main design goals of the new, fully modular simulator approach
  65 include
  66 * *Extensibility:* We want to ease maintenance and the implementation
  67   of new integrator schemes.
  68 * *Monte Carlo:* We want to add MC capability, which can be mixed with
  69   MD to create hybrid MC/MD schemes.
  70 * *Data locality & interfaces:* We aim at localizing data in objects,
  71   and offer interfaces if access from other objects is needed.
  72 * *Multi-stepping:* We aim at a design which intrinsically supports
  73   multi-step integrators, e.g. having force calls at different
  74   frequencies, or avoid having branches including rare events
  75   (trajectory writing, neighbor search, ...) in the computation loops.
  76 * *Task parallelism:* Although not first priority, we want to have a
  77   design which can be extended to allow for task parallelism.
  78
  79 The general design approach is that of a **task scheduler**. *Tasks*
  80 are argument-less functions which perform a part of the computation.
  81 Periodically during the simulation, the scheduler builds a
  82 *queue of tasks*, i.e. a list of tasks which is then run through in
  83 order. Over time, with data dependencies clearly defined, this
  84 approach can be modified to have independent tasks run in parallel.
  85
  86 ### Simulator elements
  87
  88 The task scheduler holds a list of *simulator elements*, defined by
  89 the `ISimulatorElement` interface. These elements have a
  90 `scheduleTask(Step, Time)` function, which gets called by the task
  91 scheduler. This allows the simulator element to register one (or more)
  92 function pointers to be run at that specific `(Step, Time)`. From the
  93 point of view of the element, it is important to note that the
  94 computation will not be carried out immediately, but that it will be
  95 called later during the actual (partial) simulation run. From the
  96 point of view of the builder of the task scheduler, it is important to
  97 note that the order of the elements determines the order in which
  98 computation is performed.
  99
 100     class ISimulatorElement
 101     {
 102     public:
 103         /*! \\brief Query whether element wants to run at step / time
 104          *
 105          * Element can register one or more functions to be run at that step through
 106          * the registration pointer.
 107          */
 108         virtual void scheduleTask(Step, Time, const RegisterRunFunction&) = 0;
 109         //! Method guaranteed to be called after construction, before simulator run
 110         virtual void elementSetup() = 0;
 111         //! Method guaranteed to be called after simulator run, before deconstruction
 112         virtual void elementTeardown() = 0;
 113         //! Standard virtual destructor
 114         virtual ~ISimulatorElement() = default;
 115     };
 116
 117
 118 The task scheduler periodically loops over
 119 its list of elements, builds a queue of function pointers to run, and
 120 returns this list of tasks. As an example, a possible application
 121 would be to build a new queue after each domain-decomposition (DD) /
 122 neighbor-searching (NS) step, which might occur every 100 steps. The
 123 scheduler would loop repeatedly over all its elements, with elements
 124 like the trajectory-writing element registering for only one or no
 125 step at all, the energy-calculation element registering for every
 126 tenth step, and the force, position / velocity propagation, and
 127 constraining algorithms registering for every step. The result would
 128 be a (long) queue of function pointers including all computations
 129 needed until the next DD / NS step, which can be run without any
 130 branching.
 131
 132 ### Signallers
 133
 134 Some elements might require computations by other elements. If for
 135 example, the trajectory writing is an element independent from the
 136 energy-calculation element, it needs to signal to the energy element
 137 that it is about to write a trajectory, and that the energy element
 138 should be ready for that (i.e. perform an energy calculation in the
 139 upcoming step). This requirement, which replaces the boolean branching
 140 in the current implementation, is fulfilled by a Signaller - Client
 141 model. Classes implementing the `ISignaller` interface get called
 142 *before* every loop of the element list, and can inform registered
 143 clients about things happening during that step. The trajectory
 144 element, for example, can tell the energy element that it will write
 145 to trajectory at the end of this step. The energy element can then
 146 register an energy calculation during that step, being ready to write
 147 to trajectory when requested.
 148
 149     class ISignaller
 150     {
 151     public:
 152         //! Function run before every step of scheduling
 153         virtual void signal(Step, Time) = 0;
 154         //! Method guaranteed to be called after construction, before simulator run
 155         virtual void setup() = 0;
 156     };
 157
 158 ### The modular simulator
 159
 160 The approach is most easily displayed using some simplified (pseudo) code.
 161
 162 The simulator itself is responsible to **store the elements in the
 163 right order** (in `addIntegrationElements`) This includes the different
 164 order of elements in different algorithms (e.g. leap-frog vs. velocity
 165 verlet), but also logical dependencies (energy output after compute
 166 globals). Once the algorithm has been built, the simulator simply
 167 executes one task after the next, until the end of the simulation is
 168 reached.
 169
 170     class ModularSimulator : public ISimulator
 171     {
 172         public:
 173             //! Run the simulator
 174             void run() override;
 175     }
 176
 177     void ModularSimulator::run()
 178     {
 179
 180         ModularSimulatorAlgorithmBuilder algorithmBuilder();
 181         addIntegrationElements(&algorithmBuilder);
 182         auto algorithm = algorithmBuilder.build();
 183
 184         while (const auto* task = algorithm.getNextTask())
 185         {
 186             // execute task
 187             (*task)();
 188         }
 189     }
 190
 191 The following snippet illustrates building a leap-frog integration
 192 algorithm. The algorithm builder allows for a concise description of
 193 the simulator algorithm.
 194
 195     void ModularSimulator::addIntegrationElements(ModularSimulatorAlgorithmBuilder* builder)
 196     {
 197         if (legacySimulatorData_->inputrec->eI == eiMD)
 198         {
 199             // The leap frog integration algorithm
 200             builder->add<ForceElement>();
 201              // We have a full state here (positions(t), velocities(t-dt/2), forces(t)
 202             builder->add<StatePropagatorData::Element>();
 203             if (legacySimulatorData_->inputrec->etc == TemperatureCoupling::VRescale)
 204             {
 205                 builder->add<VRescaleThermostat>(-1,
 206                                                  VRescaleThermostatUseFullStepKE::No,
 207                                                  PropagatorTag("LeapFrogPropagator"));
 208             }
 209             builder->add<Propagator<IntegrationStep::LeapFrog>>(PropagatorTag("LeapFrogPropagator"),
 210                                                                 legacySimulatorData_->inputrec->delta_t);
 211             if (legacySimulatorData_->constr)
 212             {
 213                 builder->add<ConstraintsElement<ConstraintVariable::Positions>>();
 214             }
 215             builder->add<ComputeGlobalsElement<ComputeGlobalsAlgorithm::LeapFrog>>();
 216             // We have the energies at time t here
 217             builder->add<EnergyData::Element>();
 218             if (legacySimulatorData_->inputrec->epc == PressureCoupling::ParrinelloRahman)
 219             {
 220                 builder->add<ParrinelloRahmanBarostat>(-1, PropagatorTag("LeapFrogPropagator"));
 221             }
 222         }
 223     }
 224
 225 ### The simulator algorithm
 226
 227 The simulator algorithm is responsible to **decide if elements need to
 228 run at a specific time step**. The elements get called in order, and
 229 decide whether they need to run at a specific step. This can be
 230 pre-computed for multiple steps. In the current implementation, the
 231 tasks are pre-computed for the entire life-time of the neighbor
 232 list.
 233
 234 The simulator algorithm offers functionality to get the next task
 235 from the queue. It owns all elements involved in the simulation
 236 and is hence controlling their lifetime. This ensures that pointers and
 237 callbacks exchanged between elements remain valid throughout the duration
 238 of the simulation run. It also maintains the list of tasks,
 239 and updates it when needed.
 240
 241     class ModularSimulatorAlgorithm
 242     {
 243     public:
 244         //! Get next task in queue
 245         [[nodiscard]] const SimulatorRunFunction* getNextTask();
 246     private:
 247         //! List of signalers
 248         std::vector<std::unique_ptr<ISignaller>> signallerList_;
 249         //! List of elements
 250         std::vector<std::unique_ptr<ISimulatorElement>> elementsList_;
 251
 252         //! The run queue
 253         std::vector<SimulatorRunFunction> taskQueue_;
 254         //! The task iterator
 255         std::vector<SimulatorRunFunction>::const_iterator taskIterator_;
 256
 257         //! Update task queue
 258         void updateTaskQueue();
 259     }
 260
 261 The `getNextTask()` function is returning the next task in the task
 262 queue. It rebuilds the task list when needed.
 263
 264     const SimulatorRunFunction* ModularSimulatorAlgorithm::getNextTask()
 265     {
 266         if (!taskQueue_.empty())
 267         {
 268             taskIterator_++;
 269         }
 270         if (taskIterator_ == taskQueue_.end())
 271         {
 272             if (runFinished_)
 273             {
 274                 return nullptr;
 275             }
 276             updateTaskQueue();
 277             taskIterator_ = taskQueue_.begin();
 278         }
 279         return &*taskIterator_;
 280     }
 281
 282 Updating the task queue involves calling all signallers and
 283 elements for every step of the scheduling period. This refills
 284 the task queue. It is important to keep in mind that the *scheduling step* is not
 285 necessarily identical to the *current step* of the simulation. Most of
 286 the time, the scheduling step is ahead, as we are pre-scheduling steps.
 287
 288     void ModularSimulatorAlgorithm::updateTaskQueue()
 289     {
 290         for (Step schedulingStep = currentStep;
 291              schedulingStep < currentStep + schedulingPeriod;
 292              schedulingStep++)
 293         {
 294             Time time = getTime(schedulingStep);
 295             // Have signallers signal any special treatment of scheduling step
 296             for (const auto& signaller : signallerList)
 297             {
 298                 signaller.signal(schedulingStep, time);
 299             }
 300             // Query all elements whether they need to run at scheduling step
 301             for (const auto& element : signallerList)
 302             {
 303                 element.schedule(schedulingStep, time, registerRunFunction_);
 304             }
 305         }
 306     }
 307
 308 ### Sequence diagrams
 309
 310 #### Pre-loop
 311 In the loop preparation, the signallers and elements are created and
 312 stored in the right order. The signallers and elements can then
 313 perform any setup operations needed.
 314
 315 \msc
 316 hscale="2";
 317
 318 ModularSimulatorBuilder [label="ModularSimulatorAlgorithmBuilder"],
 319 ModularSimulator [label="ModularSimulatorAlgorithm"],
 320 Signallers [label="ModularSimulatorAlgorithm::\nSignallers"],
 321 Elements [label="ModularSimulatorAlgorithm::\nElements"],
 322 TaskQueue [label="ModularSimulatorAlgorithm::\nTaskQueue"];
 323
 324 --- [ label = "constructElementsAndSignallers()" ];
 325     ModularSimulatorBuilder => Signallers [ label = "Create signallers\nand order them" ];
 326     ModularSimulatorBuilder => Elements [ label = "Create elements\nand order them" ];
 327 --- [ label = "constructElementsAndSignallers()" ];
 328 |||;
 329 |||;
 330
 331 --- [ label = "setupAllElements()" ];
 332     ModularSimulator => Signallers [ label = "Call setup()" ];
 333     Signallers box Signallers [ label = "for signaler in Signallers\n    signaller->setup()" ];
 334     |||;
 335     ModularSimulator => Elements [ label = "Call setup()" ];
 336     Elements box Elements [ label = "for element in Elements\n    element->setup()" ];
 337 --- [ label = "setupAllElements()" ];
 338 \endmsc
 339
 340 #### Main loop
 341 The main loop consists of two parts which are alternately run until the
 342 simulation stop criterion is met. The first part is the population of
 343 the task queue, which determines all tasks that will have to run to
 344 simulate the system for a given time period. In the current implementation,
 345 the scheduling period is set equal to the lifetime of the neighborlist.
 346 Once the tasks have been predetermined, the simulator runs them in order.
 347 This is the actual simulation computation, which can now run without any
 348 branching.
 349
 350 \msc
 351 hscale="2";
 352
 353 ModularSimulator [label="ModularSimulatorAlgorithm"],
 354 Signallers [label="ModularSimulatorAlgorithm::\nSignallers"],
 355 Elements [label="ModularSimulatorAlgorithm::\nElements"],
 356 TaskQueue [label="ModularSimulatorAlgorithm::\nTaskQueue"];
 357
 358 ModularSimulator box TaskQueue [ label = "loop: while(not lastStep)" ];
 359 ModularSimulator note TaskQueue [ label = "The task queue is empty. The simulation state is at step N.", textbgcolor="yellow" ];
 360 |||;
 361 |||;
 362 ModularSimulator box ModularSimulator [ label = "populateTaskQueue()" ];
 363 ModularSimulator =>> TaskQueue [ label = "Fill task queue with tasks until next neighbor-searching step" ];
 364 |||;
 365 |||;
 366 ModularSimulator note TaskQueue [ label = "The task queue now holds all tasks needed to move the simulation from step N to step N + nstlist. The simulation for these steps has not been performed yet, however. The simulation state is hence still at step N.", textbgcolor="yellow" ];
 367 |||;
 368 |||;
 369
 370 ModularSimulator => TaskQueue [ label = "Run all tasks in TaskQueue" ];
 371 TaskQueue box TaskQueue [label = "for task in TaskQueue\n    run task" ];
 372 TaskQueue note TaskQueue [ label = "All simulation computations are happening in this loop!", textbgcolor="yellow" ];
 373 |||;
 374 |||;
 375 ModularSimulator note TaskQueue [ label = "The task queue is now empty. The simulation state is at step N + nstlist.", textbgcolor="yellow" ];
 376 ModularSimulator box TaskQueue [ label = "end loop: while(not lastStep)" ];
 377
 378 \endmsc
 379
 380 #### Task scheduling
 381 A part of the main loop, the task scheduling in `populateTaskQueue()`
 382 allows the elements to push tasks to the task queue. For every scheduling
 383 step, the signallers are run first to give the elements information about
 384 the upcoming scheduling step. The scheduling routine elements are then
 385 called in order, allowing the elements to register their respective tasks.
 386
 387 \msc
 388 hscale="2";
 389
 390 ModularSimulator [label="ModularSimulatorAlgorithm"],
 391 Signallers [label="ModularSimulatorAlgorithm::\nSignallers"],
 392 Elements [label="ModularSimulatorAlgorithm::\nElements"],
 393 TaskQueue [label="ModularSimulatorAlgorithm::\nTaskQueue"];
 394
 395 --- [ label = "populateTaskQueue()" ];
 396     ModularSimulator box ModularSimulator [ label = "doDomainDecomposition()\ndoPmeLoadBalancing()" ];
 397     ModularSimulator =>> Elements [ label = "Update state and topology" ];
 398     |||;
 399     |||;
 400
 401     ModularSimulator note ModularSimulator [ label = "schedulingStep == N\nsimulationStep == N", textbgcolor="yellow" ];
 402     ModularSimulator box TaskQueue [ label = "loop: while(not nextNeighborSearchingStep)" ];
 403         ModularSimulator => Signallers [ label = "Run signallers for schedulingStep" ];
 404         Signallers box Signallers [label = "for signaller in Signallers\n    signaller->signal(scheduleStep)" ];
 405         Signallers =>> Elements [ label = "notify" ];
 406         Signallers note Elements [ label = "The elements now know if schedulingStep has anything special happening, e.g. neighbor searching, log writing, trajectory writing, ...", textbgcolor="yellow" ];
 407         |||;
 408         |||;
 409
 410         ModularSimulator => Elements [ label = "Schedule run functions for schedulingStep" ];
 411         Elements box Elements [label = "for element in Elements\n    element->scheduleTask(scheduleStep)" ];
 412         Elements =>> TaskQueue [ label = "Push task" ];
 413         Elements note TaskQueue [ label = "The elements have now registered everything they will need to do for schedulingStep.", textbgcolor="yellow" ];
 414         ModularSimulator note ModularSimulator [ label = "schedulingStep++", textbgcolor="yellow" ];
 415
 416     ModularSimulator box TaskQueue [ label = "end loop: while(not nextNeighborSearchingStep)" ];
 417 --- [ label = "populateTaskQueue()" ];
 418 ModularSimulator note ModularSimulator [ label = "schedulingStep == N + nstlist\nsimulationStep == N", textbgcolor="yellow" ];
 419
 420 \endmsc
 421
 422 ## Acceptance tests and further plans
 423
 424 Acceptance tests which need to be
 425 fulfilled to make the modular simulator the default code path:
 426 * End-to-end tests pass on both `do_md` and the new loop in
 427   Gitlab pre- and post-submit pipelines
 428 * Physical validation cases pass on the new loop
 429 * Performance on different sized benchmark cases, x86 CPU-only
 430   and NVIDIA GPU are at most 1% slower -
 431   https://github.com/ptmerz/gmxbenchmark has been developed to
 432   this purpose.
 433
 434 After the MD bare minimum, we will want to add support for
 435 * Pulling
 436 * Full support of GPU (current implementation does not support
 437 GPU update)
 438
 439 Using the new modular simulator framework, we will then explore
 440 adding new functionality to GROMACS, including
 441 * Monte Carlo barostat
 442 * hybrid MC/MD schemes
 443 * multiple-time-stepping integration
 444
 445 We will also explore optimization opportunities, including
 446 * re-use of the same queue if conditions created by user input are
 447   sufficiently favorable (by design or when observed)
 448 * simultaneous execution of independent tasks
 449
 450 We will probably not prioritize support for (and might consider
 451 deprecating from do_md in a future GROMACS version)
 452 * Simulated annealing
 453 * REMD
 454 * Simulated tempering
 455 * Multi-sim
 456 * Membrane embedding
 457 * QM/MM
 458 * FEP lambda vectors
 459 * Fancy mdp options for FEP output
 460 * MTTK
 461 * Essential dynamics
 462 * Constant acceleration groups
 463 * Ensemble-averaged restraints
 464 * Time-averaged restraints
 465 * Freeze, deform, cos-acceleration
 466
 467 ## Signaller and element details
 468
 469 The current implementation of the modular simulator consists of
 470 the following signallers and elements:
 471
 472 ### Signallers
 473
 474 All signallers have a list of pointers to clients, objects that
 475 implement a respective interface and get notified of events the
 476 signaller is communicating.
 477
 478 * `NeighborSearchSignaller`: Informs its clients whether the
 479   current step is a neighbor-searching step.
 480 * `LastStepSignaller`: Informs its clients when the current step
 481   is the last step of the simulation.
 482 * `LoggingSignaller`: Informs its clients whether output to the
 483   log file is written in the current step.
 484 * `EnergySignaller`: Informs its clients about energy related
 485   special steps, namely energy calculation steps, virial
 486   calculation steps, and free energy calculation steps.
 487 * `TrajectorySignaller`: Informs its clients if writing to
 488   trajectory (state [x/v/f] and/or energy) is planned for the
 489   current step.
 490
 491 ### Simulator Elements
 492
 493 #### `TrajectoryElement`
 494 The `TrajectoryElement` is calling its trajectory clients, passing them a valid
 495 output pointer and letting them write to trajectory. Unlike the
 496 legacy implementation, the trajectory element itself knows nothing
 497 about the data that is written to file - it is only responsible
 498 to inform clients about trajectory steps, and providing a valid
 499 file pointer to the objects that need to write to trajectory.
 500
 501 #### `ComputeGlobalsElement`
 502 The `ComputeGlobalsElement` encapsulates the legacy calls to
 503 `compute_globals`. While a new approach to the global reduction
 504 operations has been discussed, it is currently not part of this
 505 effort. This element therefore aims at offering an interface
 506 to the legacy implementation which is compatible with the new
 507 simulator approach.
 508
 509 The element currently comes in 3 (templated) flavors: the leap-frog case,
 510 the first call during a velocity-verlet integrator, and the second call
 511 during a velocity-verlet integrator. It is the responsibility of the
 512 simulator builder to place them at the right place of the
 513 integration algorithm.
 514
 515 #### `ForceElement` and `ShellFCElement`
 516 The `ForceElement` and the `ShellFCElement` encapsulate the legacy
 517 calls to `do_force` and `do_shellfc`, respectively. It is the
 518 responsibility of the simulator builder to place them at the right
 519 place of the integration algorithm. Moving forward, a version of these
 520 elements which would allow calling of do_force with subsets of the topology
 521 would be desirable to pave the way towards multiple time step integrators
 522 within modular simulator, allowing to integrate slower degrees of freedom
 523 at a different frequency than faster degrees of freedom.
 524
 525 #### `ConstraintElement`
 526 The constraint element is implemented for the two cases of constraining
 527 both positions and velocities, and only velocities. It does not change the constraint
 528 implementation itself, but replaces the legacy `constrain_coordinates`
 529 and `constrain_velocities` calls from update.h by elements implementing
 530 the ISimulatorElement interface and using the new data management.
 531
 532 #### `Propagator`
 533 The propagator element can, through templating, cover the different
 534 propagation types used in the currently implemented MD schemes. The
 535 combination of templating, static functions, and having only the
 536 inner-most operations in the static functions allows to have performance
 537 comparable to fused update elements while keeping easily re-orderable
 538 single instructions.
 539
 540 Currently, the (templated) implementation covers four cases:
 541  * *PositionsOnly:* Moves the position vector by the given time step
 542  * *VelocitiesOnly:* Moves the velocity vector by the given time step
 543  * *LeapFrog:* A manual fusion of the previous two propagators
 544  * *VelocityVerletPositionsAndVelocities:* A manual fusion of VelocitiesOnly
 545     and PositionsOnly, where VelocitiesOnly is only propagated by half the
 546     time step of PositionsOnly.
 547
 548 The propagators also allow to implement temperature and pressure coupling
 549 schemes by offering (templated) scaling of the velocities. In order to
 550 link temperature / pressure coupling objects to the propagators, the
 551 propagator objects have a tag (of strong type `PropagatorTag`). The
 552 temperature and pressure coupling objects can then connect to the
 553 matching propagator by comparing their target tag to the different
 554 propagators. Giving the propagators their tags and informing the
 555 temperature and pressure coupling algorithms which propagator they are
 556 connecting to is in the responsibility of the simulation algorithm
 557 builder.
 558
 559 #### `CompositeSimulatorElement`
 560 The composite simulator element takes a list of elements and implements
 561 the ISimulatorElement interface, making a group of elements effectively
 562 behave as one. This simplifies building algorithms.
 563
 564 #### `VRescaleThermostat`
 565 The `VRescaleThermostat` implements the v-rescale thermostat. It takes a
 566 callback to the propagator and updates the velocity scaling factor
 567 according to the v-rescale thermostat formalism.
 568
 569 #### `ParrinelloRahmanBarostat`
 570 The `ParrinelloRahmanBarostat` implements the Parrinello-Rahman barostat.
 571 It integrates the Parrinello-Rahman box velocity equations, takes a
 572 callback to the propagator to update the velocity scaling factor, and
 573 scales the box and the positions of the system.
 574
 575 #### `StatePropagatorData::Element`
 576 The `StatePropagatorData::Element` takes part in the simulator run, as it might
 577 have to save a valid state at the right moment during the
 578 integration. Placing the StatePropagatorData correctly is for now the
 579 duty of the simulator builder - this might be automated later
 580 if we have enough meta-data of the variables (i.e., if
 581 `StatePropagatorData` knows at which time the variables currently are,
 582 and can decide when a valid state (full-time step of all
 583 variables) is reached. The `StatePropagatorData::Element` is also a client of
 584 both the trajectory signaller and writer - it will save a
 585 state for later writeout during the simulator step if it
 586 knows that trajectory writing will occur later in the step,
 587 and it knows how to write to file given a file pointer by
 588 the `TrajectoryElement`.
 589
 590 #### `EnergyData::Element`
 591 The `EnergyData::Element` takes part in the simulator run,
 592 either adding data (at energy calculation steps), or
 593 recording a non-calculation step (all other steps). It is the
 594 responsibility of the simulator builder to ensure that the
 595 `EnergyData::Element` is called at a point of the simulator run
 596 at which it has access to a valid energy state.
 597
 598 It subscribes to the trajectory signaller, the energy signaller,
 599 and the logging signaller to know when an energy calculation is
 600 needed and when a non-recording step is enough. The EnergyData
 601 element is also a subscriber to the trajectory writer element,
 602 as it is responsible to write energy data to trajectory.
 603
 604 #### `FreeEnergyPerturbationData::Element`
 605 The `FreeEnergyPerturbationData::Element` is a member class of
 606 `FreeEnergyPerturbationData` that updates the lambda
 607 values during the simulation run if lambda is non-static. It
 608 implements the checkpointing client interface to save the current
 609 state of `FreeEnergyPerturbationData` for restart.
 610
 611 ## Data structures
 612
 613 ### `StatePropagatorData`
 614 The `StatePropagatorData` contains a little more than the pure
 615 statistical-physical micro state, namely the positions,
 616 velocities, forces, and box matrix, as well as a backup of
 617 the positions and box of the last time step. While it takes
 618 part in the simulator loop to be able to backup positions /
 619 boxes and save the current state if needed, it's main purpose
 620 is to offer access to its data via getter methods. All elements
 621 reading or writing to this data need a pointer to the
 622 `StatePropagatorData` and need to request their data explicitly. This
 623 will later simplify the understanding of data dependencies
 624 between elements.
 625
 626 Note that the `StatePropagatorData` can be converted to and from the
 627 legacy `t_state` object. This is useful when dealing with
 628 functionality which has not yet been adapted to use the new
 629 data approach. Of the elements currently implemented, only
 630 domain decomposition, PME load balancing, and the initial
 631 constraining are using this.
 632
 633 ### `EnergyData`
 634 The EnergyData owns the EnergyObject, and is hence responsible
 635 for saving energy data and writing it to trajectory.
 636 The EnergyData offers an interface to add virial contributions,
 637 but also allows access to the raw pointers to tensor data, the
 638 dipole vector, and the legacy energy data structures.
 639
 640 ### `FreeEnergyPerturbationData`
 641 The `FreeEnergyPerturbationData` holds the lambda vector and the
 642 current FEP state, offering access to its values via getter
 643 functions.
 644
 645 ## Simulator algorithm builder
 646 Elements that define the integration algorithm (i.e. which are
 647 added using the templated `ModularSimulatorAlgorithmBuilder::add`
 648 method) need to implement a `getElementPointerImpl` factory function.
 649 This gives them access to the data structures and some other
 650 infrastructure, but also allows elements to accept additional
 651 arguments (e.g frequency, offset, ...).
 652
 653     template<typename Element, typename... Args>
 654     void ModularSimulatorAlgorithmBuilder::add(Args&&... args)
 655     {
 656         // Get element from factory method
 657         auto* element = static_cast<Element*>(getElementPointer<Element>(
 658                 legacySimulatorData_, &elementAdditionHelper_, statePropagatorData_.get(),
 659                 energyData_.get(), freeEnergyPerturbationData_.get(), &globalCommunicationHelper_,
 660                 std::forward<Args>(args)...));
 661
 662         // Make sure returned element pointer is owned by *this
 663         // Ensuring this makes sure we can control the life time
 664         if (!elementExists(element))
 665         {
 666             throw ElementNotFoundError("Tried to append non-existing element to call list.");
 667         }
 668
 669         // Register element with infrastructure
 670     }
 671
 672 Note that `getElementPointer<Element>` will call `Element::getElementPointerImpl`,
 673 which needs to be implemented by the different elements.
 674
 675 ## Infrastructure
 676 ### `DomDecHelper` and `PmeLoadBalanceHelper`
 677 These infrastructure elements are responsible for domain decomposition and
 678 PME load balancing, respectively. They encapsulate function calls which are
 679 important for performance, but outside the scope of this effort. They rely
 680 on legacy data structures for the state (both) and the topology (domdec).
 681
 682 The elements do not implement the ISimulatorElement interface, as
 683 the Simulator is calling them explicitly between task queue population
 684 steps. This allows elements to receive the new topology / state before
 685 deciding what functionality they need to run.
 686
 687 ### Checkpointing
 688 The `CheckpointHelper` is responsible to write checkpoints, and to offer
 689 its clients access to the data read from checkpoint.
 690
 691 Writing checkpoints is done just before neighbor-searching (NS) steps,
 692 or before the last step. Checkpointing occurs periodically (by default,
 693 every 15 minutes), and needs two NS steps to take effect - on the first
 694 NS step, the checkpoint helper on master rank signals to all other ranks
 695 that checkpointing is about to occur. At the next NS step, the checkpoint
 696 is written. On the last step, checkpointing happens immediately before the
 697 step (no signalling). To be able to react to last step being signalled,
 698 the CheckpointHelper also implements the `ISimulatorElement` interface,
 699 but only registers a function if the last step has been called.
 700
 701 Checkpointing happens at the top of a simulation step, which gives a
 702 straightforward re-entry point at the top of the simulator loop.
 703
 704 #### Implementation details
 705 ##### Other (older) approaches
 706 **Legacy checkpointing approach:** All data to be checkpointed needs to be
 707 stored in one of the following data structures:
 708 * `t_state`, which also holds pointers to
 709   - `history_t` (history for restraints)
 710   - `df_history_t` (history for free energy)
 711   - `ekinstate`
 712   - `AwhHistory`
 713 * `ObservableHistory`, consisting of
 714   - `energyhistory_t`
 715   - `PullHistory`
 716   - `edsamhistory_t`
 717   - `swaphistory_t`
 718 * Checkpoint further saves details about the output files being used
 719
 720 These data structures are then serialized by a function having knowledge of
 721 their implementation details. One possibility to add data to the checkpoint
 722 is to expand one of the objects that is currently being checkpointed, and
 723 edit the respective `do_cpt_XXX` function in `checkpoint.cpp` which interacts
 724 with the XDR library. The alternative would be to write an entirely new data
 725 structure, changing the function signature of all checkpoint-related functions,
 726 and write a corresponding low-level routine interacting with the XDR library.
 727
 728 **The MDModule approach:** To allow for modules to write checkpoints, the legacy
 729 checkpoint was extended by a KVTree. When writing to checkpoint, this tree gets
 730 filled (via callbacks) by the single modules, and then serialized. When reading,
 731 the KVTree gets deserialized, and then distributed to the modules which can read
 732 back the data previously stored.
 733
 734 ##### Modular simulator design
 735
 736 The MDModule checks off almost all requirements to a modularized checkpointing format.
 737 The proposed design is therefore an evolved form of this approach. Notably, two
 738 improvements include
 739 * Hide the implementation details of the data structure holding the data (currently,
 740   a KV-Tree) from the clients. This allows to change the implementation details of
 741   reading / writing checkpoints without touching client code.
 742 * Offer a unified way to read and write to data, allowing clients to write one
 743   (templated) function to read to and write from checkpoint. This allows to
 744   eliminate code duplication and the danger of having read and write functions
 745   getting out of sync.
 746
 747 The modular simulator checkpointing does not currently change the way that the
 748 legacy simulator is checkpointing. Some data structures involved in the legacy
 749 checkpoint did, however, get an implementation of the new approach. This is
 750 needed for ModularSimulator checkpointing, but also gives a glimpse of how
 751 implementing this for legacy data structures would look like.
 752
 753 The most important design part is the `CheckpointData` class. It exposes methods
 754 to read and write scalar values, ArrayRefs, and tensors. It also allows to create
 755 a "sub-object" of the same type `CheckpointData` which allows to have more complex
 756 members implement their own checkpointing routines (without having to be aware that
 757 they are a member). All methods are templated on the chosen operation,
 758 `CheckpointDataOperation::Read` or `CheckpointDataOperation::Write`, allowing clients
 759 to use the same code to read and write to checkpoint. Type traits and constness are
 760 used to catch as many errors as possible at compile time. `CheckpointData` uses a
 761 KV-tree to store the data internally. This is however never exposed to the client.
 762 Having this abstraction layer gives freedom to change the internal implementation
 763 in the future.
 764
 765 All `CheckpointData` objects are owned by a `ReadCheckpointDataHolder` or
 766 `WriteCheckpointDataHolder`. These holder classes own the internal KV-tree, and offer
 767 `deserialize(ISerializer*)` and `serialize(ISerializer*)` functions, respectively,
 768 which allow to read from / write to file. This separation clearly defines ownership
 769 and separates the interface aimed at file IO from the interface aimed at objects
 770 reading / writing checkpoints.
 771
 772 Checkpointing for modular simulator is tied in the general checkpoint facility by
 773 passing a `ReadCheckpointDataHolder` or `WriteCheckpointDataHolder` object to the
 774 legacy checkpoint read and write operations.
 775
 776 ##### Notes about the modular simulator checkpointing design
 777
 778 **Distinction of data between clients:** The design requires that separate
 779 clients have independent sub-`CheckpointData` objects identified by a unique key.
 780 This key is the only thing that needs to be unique between clients, i.e. clients are
 781 free to use any key _within_ their sub-`CheckpointData` without danger to overwrite
 782 data used by other clients.
 783
 784 **Versioning:** The design relies on clients keeping their own versioning system
 785 within their sub-`CheckpointData` object. As the data stored by clients is opaque
 786 to the top level checkpointing facility, it has no way to know when the internals
 787 change. Only fundamental changes to the checkpointing architecture can still be
 788 tracked by a top-level checkpoint version.
 789
 790 **Key existence:** The currently uploaded design does not allow to check whether
 791 a key is present in `CheckpointData`. This could be introduced if needed - however,
 792 if clients write self-consistent read and write code, this should never be needed.
 793 Checking for key existence seems rather to be a lazy way to circumvent versioning,
 794 and is therefore discouraged.
 795
 796 **Callback method:** The modular simulator and MDModules don't use the exact same
 797 way of communicating with clients. The two methods could be unified if needed.
 798 The only _fundamental_ difference is that modular simulator clients need to identify
 799 with a unique key to receive their dedicated sub-data, while MDModules all read from
 800 and write to the same KV-tree. MDModules could be adapted to that by either requiring
 801 a unique key from the modules, or by using the same `CheckpointData` for all modules
 802 and using a single unique key (something like "MDModules") to register that object
 803 with the global checkpoint.
 804
 805 **Performance:** One of the major differences between the new approach and the legacy
 806 checkpointing is that data gets _copied_ into `CheckpointData`, while the legacy
 807 approach only took a pointer to the data and serialized it. This slightly reduces
 808 performance. Some thoughts on that:
 809 * By default, checkpointing happens at the start of the simulation (only if reading
 810 from checkpoint), every 15 minutes during simulation runs, and at the end of the
 811 simulation run. This makes it a low priority target for optimization. Consequently,
 812 not much thoughts have been put in the optimization, but there's certainly some way
 813 to improve things here and there if we consider it necessary.
 814 * The copying will only have measurable effect when large data gets checkpointed -
 815 likely only for saving the positions / velocities of the entire state, so that
 816 should be the first target for optimization if needed.
 817 * Copying data could have advantages moving forward - we could continue the
 818 simulation as soon as the data is written to the `CheckpointData` object, and don't
 819 necessarily need to wait for writing to the physical medium to happen. It also
 820 simplifies moving the point at which checkpointing is performed within the
 821 integrator. One could envision clients storing their data any time during the
 822 integration step, and serializing the resulting `CheckpointData` after the step.
 823 This avoids the need to find a single point within the loop at which all clients
 824 need to be in a state suitable for checkpointing.
 825 * If, however, we wanted to use the same approach for other, more frequent
 826 (and hence more perfomance critical) operations such as saving/restoring states
 827 for MC type algorithms or swapping of states between running simulations in
 828 multi-sim type settings, performance would become more of an issue.
 829
 830 **ISerializer vs KV-tree:** The new approach uses a KV tree internally. The
 831 KV tree is well suited to represent the design philosophy of the approach:
 832 Checkpointing offers a functionality which allows clients to write/read any data
 833 they want. This data remains opaque to the checkpointing element. Clients can
 834 read or write in any order, and in the future, maybe even simultaneously. Data
 835 written by any element should be protected from access from other elements to
 836 avoid bugs. The downside of the KV tree is that all data gets copied before
 837 getting written to file (see above).
 838
 839 Replacing a KV tree by a single ISerializer object which gets passed to clients
 840 would require elements to read and write sequentially in a prescribed order. With
 841 the help of InMemorySerializer, a KV-Tree could likely be emulated (with sub-objects
 842 that serialize to memory, and then a parent object that serializes this memory to
 843 file), but that doesn't present a clear advantage anymore.
 844
 845 ### `TopologyHolder`
 846 The topology object owns the local topology and holds a constant reference
 847 to the global topology owned by the ISimulator.
 848
 849 The local topology is only infrequently changed if domain decomposition is
 850 on, and never otherwise. The topology holder therefore offers elements to register
 851 as ITopologyHolderClients. If they do so, they get a handle to the updated local
 852 topology whenever it is changed, and can rely that their handle is valid
 853 until the next update. The domain decomposition element is defined as friend
 854 class to be able to update the local topology when needed.