docs/doxygen/lib/modularsimulator.md

   1 The modular simulator {#page_modularsimulator}
   2 ==============================================
   3
   4 A new modular approach to the GROMACS simulator is described. The
   5 simulator in GROMACS is the object which carries out a simulation. The
   6 simulator object is created and owned by the runner object, which is
   7 outside of the scope of this new approach, and will hence not be further
   8 described. The simulator object provides access to some generally used
   9 data, most of which is owned by the runner object.
  10
  11 ## Using the modular simulator
  12 GROMACS will automatically use the modular simulator for the velocity
  13 verlet integrator (`integrator = md-vv`), if the functionality chosen
  14 in the other input parameters is implemented in the new framework.
  15 Currently, this includes NVE, NVT, NPH, and NPT simulations,
  16 with or without free energy perturbation, using thermodynamic
  17 boundary conditions
  18
  19 * `tcoupl`: `no`, `v-rescale`, or `berendsen`
  20 * `pcoupl`: `no` or `parrinello-rahman`
  21
  22 To disable the modular simulator for cases defaulting to the new framework,
  23 the environment variable `GMX_DISABLE_MODULAR_SIMULATOR=ON` can be set. To
  24 use the new framework also for `integrator = md` (where the functionality is
  25 implemented), the environment variable `GMX_USE_MODULAR_SIMULATOR=ON` can
  26 be set to override legacy default.
  27
  28 ## Legacy implementation
  29
  30 In the legacy implementation, the simulator consisted of a number of
  31 independent functions carrying out different type of simulations, such
  32 as `do_md` (MD simulations), `do_cg` and `do_steep` (minimization),
  33 `do_rerun` (force and energy evaluation of simulation trajectories),
  34 `do_mimic` (MiMiC QM/MM simulations), `do_nm` (normal mode analysis),
  35 and `do_tpi` (test-particle insertion).
  36
  37 The legacy approach has some obvious drawbacks:
  38 * *Data management:* Each of the `do_*` functions defines local data,
  39   including complex objects encapsulating some data and functionality,
  40   but also data structures effectively used as "global variables" for
  41   communication between different parts of the simulation. Neither the
  42   ownership nor the access rights (except for `const` qualifiers) are
  43   clearly defined.
  44 * *Dependencies:* Many function calls in the `do_*` functions are
  45   dependent on each others, i.e. rely on being called in a specific
  46   order, but these dependencies are not clearly defined.
  47 * *Branches:* The flow of the `do_*` functions are hard to understand
  48   due to branching. At setup time, and then at every step of the
  49   simulation run, a number of booleans are set (e.g. `bNS` (do neighbor
  50   searching), `bCalcEner` (calculate energies), `do_ene` (write
  51   energies), `bEner` (energy calculation needed), etc). These booleans
  52   enable or disable branches of the code (for the current step or the
  53   entire run), mostly encoded as `if(...)` statements in the main `do_*`
  54   loop, but also in functions called from there.
  55 * *Task scheduling:* Poorly defined dependencies and per-step branching
  56   make task scheduling (e.g. parallel execution of independent tasks)
  57   very difficult.
  58 * *Error-prone for developers:* Poorly defined dependencies and unclear
  59   code flow make changing the simulator functions very error-prone,
  60   rendering the implementation of new methods tedious.
  61
  62 ## The modular simulator approach
  63
  64 The main design goals of the new, fully modular simulator approach
  65 include
  66 * *Extensibility:* We want to ease maintenance and the implementation
  67   of new integrator schemes.
  68 * *Monte Carlo:* We want to add MC capability, which can be mixed with
  69   MD to create hybrid MC/MD schemes.
  70 * *Data locality & interfaces:* We aim at localizing data in objects,
  71   and offer interfaces if access from other objects is needed.
  72 * *Multi-stepping:* We aim at a design which intrinsically supports
  73   multi-step integrators, e.g. having force calls at different
  74   frequencies, or avoid having branches including rare events
  75   (trajectory writing, neighbor search, ...) in the computation loops.
  76 * *Task parallelism:* Although not first priority, we want to have a
  77   design which can be extended to allow for task parallelism.
  78
  79 The general design approach is that of a **task scheduler**. *Tasks*
  80 are argument-less functions which perform a part of the computation.
  81 Periodically during the simulation, the scheduler builds a
  82 *queue of tasks*, i.e. a list of tasks which is then run through in
  83 order. Over time, with data dependencies clearly defined, this
  84 approach can be modified to have independent tasks run in parallel.
  85
  86 ### Simulator elements
  87
  88 The task scheduler holds a list of *simulator elements*, defined by
  89 the `ISimulatorElement` interface. These elements have a
  90 `scheduleTask(Step, Time)` function, which gets called by the task
  91 scheduler. This allows the simulator element to register one (or more)
  92 function pointers to be run at that specific `(Step, Time)`. From the
  93 point of view of the element, it is important to note that the
  94 computation will not be carried out immediately, but that it will be
  95 called later during the actual (partial) simulation run. From the
  96 point of view of the builder of the task scheduler, it is important to
  97 note that the order of the elements determines the order in which
  98 computation is performed.
  99
 100     class ISimulatorElement
 101     {
 102     public:
 103         /*! \\brief Query whether element wants to run at step / time
 104          *
 105          * Element can register one or more functions to be run at that step through
 106          * the registration pointer.
 107          */
 108         virtual void scheduleTask(Step, Time, const RegisterRunFunction&) = 0;
 109         //! Method guaranteed to be called after construction, before simulator run
 110         virtual void elementSetup() = 0;
 111         //! Method guaranteed to be called after simulator run, before deconstruction
 112         virtual void elementTeardown() = 0;
 113         //! Standard virtual destructor
 114         virtual ~ISimulatorElement() = default;
 115     };
 116
 117
 118 The task scheduler periodically loops over
 119 its list of elements, builds a queue of function pointers to run, and
 120 returns this list of tasks. As an example, a possible application
 121 would be to build a new queue after each domain-decomposition (DD) /
 122 neighbor-searching (NS) step, which might occur every 100 steps. The
 123 scheduler would loop repeatedly over all its elements, with elements
 124 like the trajectory-writing element registering for only one or no
 125 step at all, the energy-calculation element registering for every
 126 tenth step, and the force, position / velocity propagation, and
 127 constraining algorithms registering for every step. The result would
 128 be a (long) queue of function pointers including all computations
 129 needed until the next DD / NS step, which can be run without any
 130 branching.
 131
 132 ### Signallers
 133
 134 Some elements might require computations by other elements. If for
 135 example, the trajectory writing is an element independent from the
 136 energy-calculation element, it needs to signal to the energy element
 137 that it is about to write a trajectory, and that the energy element
 138 should be ready for that (i.e. perform an energy calculation in the
 139 upcoming step). This requirement, which replaces the boolean branching
 140 in the current implementation, is fulfilled by a Signaller - Client
 141 model. Classes implementing the `ISignaller` interface get called
 142 *before* every loop of the element list, and can inform registered
 143 clients about things happening during that step. The trajectory
 144 element, for example, can tell the energy element that it will write
 145 to trajectory at the end of this step. The energy element can then
 146 register an energy calculation during that step, being ready to write
 147 to trajectory when requested.
 148
 149     class ISignaller
 150     {
 151     public:
 152         //! Function run before every step of scheduling
 153         virtual void signal(Step, Time) = 0;
 154         //! Method guaranteed to be called after construction, before simulator run
 155         virtual void setup() = 0;
 156     };
 157
 158 ### The modular simulator
 159
 160 The approach is most easily displayed using some simplified (pseudo) code.
 161
 162 The simulator itself is responsible to **store the elements in the
 163 right order** (in `addIntegrationElements`) This includes the different
 164 order of elements in different algorithms (e.g. leap-frog vs. velocity
 165 verlet), but also logical dependencies (energy output after compute
 166 globals). Once the algorithm has been built, the simulator simply
 167 executes one task after the next, until the end of the simulation is
 168 reached.
 169
 170     class ModularSimulator : public ISimulator
 171     {
 172         public:
 173             //! Run the simulator
 174             void run() override;
 175     }
 176
 177     void ModularSimulator::run()
 178     {
 179
 180         ModularSimulatorAlgorithmBuilder algorithmBuilder();
 181         addIntegrationElements(&algorithmBuilder);
 182         auto algorithm = algorithmBuilder.build();
 183
 184         while (const auto* task = algorithm.getNextTask())
 185         {
 186             // execute task
 187             (*task)();
 188         }
 189     }
 190
 191 The following snippet illustrates building a leap-frog integration
 192 algorithm. The algorithm builder allows for a concise description of
 193 the simulator algorithm.
 194
 195     void ModularSimulator::addIntegrationElements(ModularSimulatorAlgorithmBuilder* builder)
 196     {
 197         if (legacySimulatorData_->inputrec->eI == eiMD)
 198         {
 199             // The leap frog integration algorithm
 200             builder->add<ForceElement>();
 201              // We have a full state here (positions(t), velocities(t-dt/2), forces(t)
 202             builder->add<StatePropagatorData::Element>();
 203             if (legacySimulatorData_->inputrec->etc == etcVRESCALE)
 204             {
 205                 builder->add<VRescaleThermostat>(-1, VRescaleThermostatUseFullStepKE::No);
 206             }
 207             builder->add<Propagator<IntegrationStep::LeapFrog>>(legacySimulatorData_->inputrec->delta_t,
 208                                                                 RegisterWithThermostat::True,
 209                                                                 RegisterWithBarostat::True);
 210             if (legacySimulatorData_->constr)
 211             {
 212                 builder->add<ConstraintsElement<ConstraintVariable::Positions>>();
 213             }
 214             builder->add<ComputeGlobalsElement<ComputeGlobalsAlgorithm::LeapFrog>>();
 215             // We have the energies at time t here
 216             builder->add<EnergyData::Element>();
 217             if (legacySimulatorData_->inputrec->epc == epcPARRINELLORAHMAN)
 218             {
 219                 builder->add<ParrinelloRahmanBarostat>(-1);
 220             }
 221         }
 222     }
 223
 224 ### The simulator algorithm
 225
 226 The simulator algorithm is responsible to **decide if elements need to
 227 run at a specific time step**. The elements get called in order, and
 228 decide whether they need to run at a specific step. This can be
 229 pre-computed for multiple steps. In the current implementation, the
 230 tasks are pre-computed for the entire life-time of the neighbor
 231 list.
 232
 233 The simulator algorithm offers functionality to get the next task
 234 from the queue. It owns all elements involved in the simulation
 235 and is hence controlling their lifetime. This ensures that pointers and
 236 callbacks exchanged between elements remain valid throughout the duration
 237 of the simulation run. It also maintains the list of tasks,
 238 and updates it when needed.
 239
 240     class ModularSimulatorAlgorithm
 241     {
 242     public:
 243         //! Get next task in queue
 244         [[nodiscard]] const SimulatorRunFunction* getNextTask();
 245     private:
 246         //! List of signalers
 247         std::vector<std::unique_ptr<ISignaller>> signallerList_;
 248         //! List of elements
 249         std::vector<std::unique_ptr<ISimulatorElement>> elementsList_;
 250
 251         //! The run queue
 252         std::vector<SimulatorRunFunction> taskQueue_;
 253         //! The task iterator
 254         std::vector<SimulatorRunFunction>::const_iterator taskIterator_;
 255
 256         //! Update task queue
 257         void updateTaskQueue();
 258     }
 259
 260 The `getNextTask()` function is returning the next task in the task
 261 queue. It rebuilds the task list when needed.
 262
 263     const SimulatorRunFunction* ModularSimulatorAlgorithm::getNextTask()
 264     {
 265         if (!taskQueue_.empty())
 266         {
 267             taskIterator_++;
 268         }
 269         if (taskIterator_ == taskQueue_.end())
 270         {
 271             if (runFinished_)
 272             {
 273                 return nullptr;
 274             }
 275             updateTaskQueue();
 276             taskIterator_ = taskQueue_.begin();
 277         }
 278         return &*taskIterator_;
 279     }
 280
 281 Updating the task queue involves calling all signallers and
 282 elements for every step of the scheduling period. This refills
 283 the task queue. It is important to keep in mind that the *scheduling step* is not
 284 necessarily identical to the *current step* of the simulation. Most of
 285 the time, the scheduling step is ahead, as we are pre-scheduling steps.
 286
 287     void ModularSimulatorAlgorithm::updateTaskQueue()
 288     {
 289         for (Step schedulingStep = currentStep;
 290              schedulingStep < currentStep + schedulingPeriod;
 291              schedulingStep++)
 292         {
 293             Time time = getTime(schedulingStep);
 294             // Have signallers signal any special treatment of scheduling step
 295             for (const auto& signaller : signallerList)
 296             {
 297                 signaller.signal(schedulingStep, time);
 298             }
 299             // Query all elements whether they need to run at scheduling step
 300             for (const auto& element : signallerList)
 301             {
 302                 element.schedule(schedulingStep, time, registerRunFunction_);
 303             }
 304         }
 305     }
 306
 307 ### Sequence diagrams
 308
 309 #### Pre-loop
 310 In the loop preparation, the signallers and elements are created and
 311 stored in the right order. The signallers and elements can then
 312 perform any setup operations needed.
 313
 314 \msc
 315 hscale="2";
 316
 317 ModularSimulatorBuilder [label="ModularSimulatorAlgorithmBuilder"],
 318 ModularSimulator [label="ModularSimulatorAlgorithm"],
 319 Signallers [label="ModularSimulatorAlgorithm::\nSignallers"],
 320 Elements [label="ModularSimulatorAlgorithm::\nElements"],
 321 TaskQueue [label="ModularSimulatorAlgorithm::\nTaskQueue"];
 322
 323 --- [ label = "constructElementsAndSignallers()" ];
 324     ModularSimulatorBuilder => Signallers [ label = "Create signallers\nand order them" ];
 325     ModularSimulatorBuilder => Elements [ label = "Create elements\nand order them" ];
 326 --- [ label = "constructElementsAndSignallers()" ];
 327 |||;
 328 |||;
 329
 330 --- [ label = "setupAllElements()" ];
 331     ModularSimulator => Signallers [ label = "Call setup()" ];
 332     Signallers box Signallers [ label = "for signaler in Signallers\n    signaller->setup()" ];
 333     |||;
 334     ModularSimulator => Elements [ label = "Call setup()" ];
 335     Elements box Elements [ label = "for element in Elements\n    element->setup()" ];
 336 --- [ label = "setupAllElements()" ];
 337 \endmsc
 338
 339 #### Main loop
 340 The main loop consists of two parts which are alternately run until the
 341 simulation stop criterion is met. The first part is the population of
 342 the task queue, which determines all tasks that will have to run to
 343 simulate the system for a given time period. In the current implementation,
 344 the scheduling period is set equal to the lifetime of the neighborlist.
 345 Once the tasks have been predetermined, the simulator runs them in order.
 346 This is the actual simulation computation, which can now run without any
 347 branching.
 348
 349 \msc
 350 hscale="2";
 351
 352 ModularSimulator [label="ModularSimulatorAlgorithm"],
 353 Signallers [label="ModularSimulatorAlgorithm::\nSignallers"],
 354 Elements [label="ModularSimulatorAlgorithm::\nElements"],
 355 TaskQueue [label="ModularSimulatorAlgorithm::\nTaskQueue"];
 356
 357 ModularSimulator box TaskQueue [ label = "loop: while(not lastStep)" ];
 358 ModularSimulator note TaskQueue [ label = "The task queue is empty. The simulation state is at step N.", textbgcolor="yellow" ];
 359 |||;
 360 |||;
 361 ModularSimulator box ModularSimulator [ label = "populateTaskQueue()" ];
 362 ModularSimulator =>> TaskQueue [ label = "Fill task queue with tasks until next neighbor-searching step" ];
 363 |||;
 364 |||;
 365 ModularSimulator note TaskQueue [ label = "The task queue now holds all tasks needed to move the simulation from step N to step N + nstlist. The simulation for these steps has not been performed yet, however. The simulation state is hence still at step N.", textbgcolor="yellow" ];
 366 |||;
 367 |||;
 368
 369 ModularSimulator => TaskQueue [ label = "Run all tasks in TaskQueue" ];
 370 TaskQueue box TaskQueue [label = "for task in TaskQueue\n    run task" ];
 371 TaskQueue note TaskQueue [ label = "All simulation computations are happening in this loop!", textbgcolor="yellow" ];
 372 |||;
 373 |||;
 374 ModularSimulator note TaskQueue [ label = "The task queue is now empty. The simulation state is at step N + nstlist.", textbgcolor="yellow" ];
 375 ModularSimulator box TaskQueue [ label = "end loop: while(not lastStep)" ];
 376
 377 \endmsc
 378
 379 #### Task scheduling
 380 A part of the main loop, the task scheduling in `populateTaskQueue()`
 381 allows the elements to push tasks to the task queue. For every scheduling
 382 step, the signallers are run first to give the elements information about
 383 the upcoming scheduling step. The scheduling routine elements are then
 384 called in order, allowing the elements to register their respective tasks.
 385
 386 \msc
 387 hscale="2";
 388
 389 ModularSimulator [label="ModularSimulatorAlgorithm"],
 390 Signallers [label="ModularSimulatorAlgorithm::\nSignallers"],
 391 Elements [label="ModularSimulatorAlgorithm::\nElements"],
 392 TaskQueue [label="ModularSimulatorAlgorithm::\nTaskQueue"];
 393
 394 --- [ label = "populateTaskQueue()" ];
 395     ModularSimulator box ModularSimulator [ label = "doDomainDecomposition()\ndoPmeLoadBalancing()" ];
 396     ModularSimulator =>> Elements [ label = "Update state and topology" ];
 397     |||;
 398     |||;
 399
 400     ModularSimulator note ModularSimulator [ label = "schedulingStep == N\nsimulationStep == N", textbgcolor="yellow" ];
 401     ModularSimulator box TaskQueue [ label = "loop: while(not nextNeighborSearchingStep)" ];
 402         ModularSimulator => Signallers [ label = "Run signallers for schedulingStep" ];
 403         Signallers box Signallers [label = "for signaller in Signallers\n    signaller->signal(scheduleStep)" ];
 404         Signallers =>> Elements [ label = "notify" ];
 405         Signallers note Elements [ label = "The elements now know if schedulingStep has anything special happening, e.g. neighbor searching, log writing, trajectory writing, ...", textbgcolor="yellow" ];
 406         |||;
 407         |||;
 408
 409         ModularSimulator => Elements [ label = "Schedule run functions for schedulingStep" ];
 410         Elements box Elements [label = "for element in Elements\n    element->scheduleTask(scheduleStep)" ];
 411         Elements =>> TaskQueue [ label = "Push task" ];
 412         Elements note TaskQueue [ label = "The elements have now registered everything they will need to do for schedulingStep.", textbgcolor="yellow" ];
 413         ModularSimulator note ModularSimulator [ label = "schedulingStep++", textbgcolor="yellow" ];
 414
 415     ModularSimulator box TaskQueue [ label = "end loop: while(not nextNeighborSearchingStep)" ];
 416 --- [ label = "populateTaskQueue()" ];
 417 ModularSimulator note ModularSimulator [ label = "schedulingStep == N + nstlist\nsimulationStep == N", textbgcolor="yellow" ];
 418
 419 \endmsc
 420
 421 ## Acceptance tests and further plans
 422
 423 Acceptance tests which need to be
 424 fulfilled to make the modular simulator the default code path:
 425 * End-to-end tests pass on both `do_md` and the new loop in
 426   Gitlab pre- and post-submit pipelines
 427 * Physical validation cases pass on the new loop
 428 * Performance on different sized benchmark cases, x86 CPU-only
 429   and NVIDIA GPU are at most 1% slower -
 430   https://github.com/ptmerz/gmxbenchmark has been developed to
 431   this purpose.
 432
 433 After the MD bare minimum, we will want to add support for
 434 * Pulling
 435 * Full support of GPU (current implementation does not support
 436 GPU update)
 437
 438 Using the new modular simulator framework, we will then explore
 439 adding new functionality to GROMACS, including
 440 * Monte Carlo barostat
 441 * hybrid MC/MD schemes
 442 * multiple-time-stepping integration
 443
 444 We will also explore optimization opportunities, including
 445 * re-use of the same queue if conditions created by user input are
 446   sufficiently favorable (by design or when observed)
 447 * simultaneous execution of independent tasks
 448
 449 We will probably not prioritize support for (and might consider
 450 deprecating from do_md in a future GROMACS version)
 451 * Simulated annealing
 452 * REMD
 453 * Simulated tempering
 454 * Multi-sim
 455 * Membrane embedding
 456 * QM/MM
 457 * FEP lambda vectors
 458 * Fancy mdp options for FEP output
 459 * MTTK
 460 * Essential dynamics
 461 * Constant acceleration groups
 462 * Ensemble-averaged restraints
 463 * Time-averaged restraints
 464 * Freeze, deform, cos-acceleration
 465
 466 ## Signaller and element details
 467
 468 The current implementation of the modular simulator consists of
 469 the following signallers and elements:
 470
 471 ### Signallers
 472
 473 All signallers have a list of pointers to clients, objects that
 474 implement a respective interface and get notified of events the
 475 signaller is communicating.
 476
 477 * `NeighborSearchSignaller`: Informs its clients whether the
 478   current step is a neighbor-searching step.
 479 * `LastStepSignaller`: Informs its clients when the current step
 480   is the last step of the simulation.
 481 * `LoggingSignaller`: Informs its clients whether output to the
 482   log file is written in the current step.
 483 * `EnergySignaller`: Informs its clients about energy related
 484   special steps, namely energy calculation steps, virial
 485   calculation steps, and free energy calculation steps.
 486 * `TrajectorySignaller`: Informs its clients if writing to
 487   trajectory (state [x/v/f] and/or energy) is planned for the
 488   current step.
 489
 490 ### Simulator Elements
 491
 492 #### `TrajectoryElement`
 493 The `TrajectoryElement` is calling its trajectory clients, passing them a valid
 494 output pointer and letting them write to trajectory. Unlike the
 495 legacy implementation, the trajectory element itself knows nothing
 496 about the data that is written to file - it is only responsible
 497 to inform clients about trajectory steps, and providing a valid
 498 file pointer to the objects that need to write to trajectory.
 499
 500 #### `ComputeGlobalsElement`
 501 The `ComputeGlobalsElement` encapsulates the legacy calls to
 502 `compute_globals`. While a new approach to the global reduction
 503 operations has been discussed, it is currently not part of this
 504 effort. This element therefore aims at offering an interface
 505 to the legacy implementation which is compatible with the new
 506 simulator approach.
 507
 508 The element currently comes in 3 (templated) flavors: the leap-frog case,
 509 the first call during a velocity-verlet integrator, and the second call
 510 during a velocity-verlet integrator. It is the responsibility of the
 511 simulator builder to place them at the right place of the
 512 integration algorithm.
 513
 514 #### `ForceElement` and `ShellFCElement`
 515 The `ForceElement` and the `ShellFCElement` encapsulate the legacy
 516 calls to `do_force` and `do_shellfc`, respectively. It is the
 517 responsibility of the simulator builder to place them at the right
 518 place of the integration algorithm. Moving forward, a version of these
 519 elements which would allow calling of do_force with subsets of the topology
 520 would be desirable to pave the way towards multiple time step integrators
 521 within modular simulator, allowing to integrate slower degrees of freedom
 522 at a different frequency than faster degrees of freedom.
 523
 524 #### `ConstraintElement`
 525 The constraint element is implemented for the two cases of constraining
 526 both positions and velocities, and only velocities. It does not change the constraint
 527 implementation itself, but replaces the legacy `constrain_coordinates`
 528 and `constrain_velocities` calls from update.h by elements implementing
 529 the ISimulatorElement interface and using the new data management.
 530
 531 #### `Propagator`
 532 The propagator element can, through templating, cover the different
 533 propagation types used in the currently implemented MD schemes. The
 534 combination of templating, static functions, and having only the
 535 inner-most operations in the static functions allows to have performance
 536 comparable to fused update elements while keeping easily re-orderable
 537 single instructions.
 538
 539 Currently, the (templated) implementation covers four cases:
 540  * *PositionsOnly:* Moves the position vector by the given time step
 541  * *VelocitiesOnly:* Moves the velocity vector by the given time step
 542  * *LeapFrog:* A manual fusion of the previous two propagators
 543  * *VelocityVerletPositionsAndVelocities:* A manual fusion of VelocitiesOnly
 544     and PositionsOnly, where VelocitiesOnly is only propagated by half the
 545     time step of PositionsOnly.
 546
 547 The propagators also allow to implement temperature and pressure coupling
 548 schemes by offering (templated) scaling of the velocities.
 549
 550 #### `CompositeSimulatorElement`
 551 The composite simulator element takes a list of elements and implements
 552 the ISimulatorElement interface, making a group of elements effectively
 553 behave as one. This simplifies building algorithms.
 554
 555 #### `VRescaleThermostat`
 556 The `VRescaleThermostat` implements the v-rescale thermostat. It takes a
 557 callback to the propagator and updates the velocity scaling factor
 558 according to the v-rescale thermostat formalism.
 559
 560 #### `ParrinelloRahmanBarostat`
 561 The `ParrinelloRahmanBarostat` implements the Parrinello-Rahman barostat.
 562 It integrates the Parrinello-Rahman box velocity equations, takes a
 563 callback to the propagator to update the velocity scaling factor, and
 564 scales the box and the positions of the system.
 565
 566 #### `StatePropagatorData::Element`
 567 The `StatePropagatorData::Element` takes part in the simulator run, as it might
 568 have to save a valid state at the right moment during the
 569 integration. Placing the StatePropagatorData correctly is for now the
 570 duty of the simulator builder - this might be automated later
 571 if we have enough meta-data of the variables (i.e., if
 572 `StatePropagatorData` knows at which time the variables currently are,
 573 and can decide when a valid state (full-time step of all
 574 variables) is reached. The `StatePropagatorData::Element` is also a client of
 575 both the trajectory signaller and writer - it will save a
 576 state for later writeout during the simulator step if it
 577 knows that trajectory writing will occur later in the step,
 578 and it knows how to write to file given a file pointer by
 579 the `TrajectoryElement`.
 580
 581 #### `EnergyData::Element`
 582 The `EnergyData::Element` takes part in the simulator run,
 583 either adding data (at energy calculation steps), or
 584 recording a non-calculation step (all other steps). It is the
 585 responsibility of the simulator builder to ensure that the
 586 `EnergyData::Element` is called at a point of the simulator run
 587 at which it has access to a valid energy state.
 588
 589 It subscribes to the trajectory signaller, the energy signaller,
 590 and the logging signaller to know when an energy calculation is
 591 needed and when a non-recording step is enough. The EnergyData
 592 element is also a subscriber to the trajectory writer element,
 593 as it is responsible to write energy data to trajectory.
 594
 595 #### `FreeEnergyPerturbationData::Element`
 596 The `FreeEnergyPerturbationData::Element` is a member class of
 597 `FreeEnergyPerturbationData` that updates the lambda
 598 values during the simulation run if lambda is non-static. It
 599 implements the checkpointing client interface to save the current
 600 state of `FreeEnergyPerturbationData` for restart.
 601
 602 ## Data structures
 603
 604 ### `StatePropagatorData`
 605 The `StatePropagatorData` contains a little more than the pure
 606 statistical-physical micro state, namely the positions,
 607 velocities, forces, and box matrix, as well as a backup of
 608 the positions and box of the last time step. While it takes
 609 part in the simulator loop to be able to backup positions /
 610 boxes and save the current state if needed, it's main purpose
 611 is to offer access to its data via getter methods. All elements
 612 reading or writing to this data need a pointer to the
 613 `StatePropagatorData` and need to request their data explicitly. This
 614 will later simplify the understanding of data dependencies
 615 between elements.
 616
 617 Note that the `StatePropagatorData` can be converted to and from the
 618 legacy `t_state` object. This is useful when dealing with
 619 functionality which has not yet been adapted to use the new
 620 data approach. Of the elements currently implemented, only
 621 domain decomposition, PME load balancing, and the initial
 622 constraining are using this.
 623
 624 ### `EnergyData`
 625 The EnergyData owns the EnergyObject, and is hence responsible
 626 for saving energy data and writing it to trajectory.
 627 The EnergyData offers an interface to add virial contributions,
 628 but also allows access to the raw pointers to tensor data, the
 629 dipole vector, and the legacy energy data structures.
 630
 631 ### `FreeEnergyPerturbationData`
 632 The `FreeEnergyPerturbationData` holds the lambda vector and the
 633 current FEP state, offering access to its values via getter
 634 functions.
 635
 636 ## Simulator algorithm builder
 637 Elements that define the integration algorithm (i.e. which are
 638 added using the templated `ModularSimulatorAlgorithmBuilder::add`
 639 method) need to implement a `getElementPointerImpl` factory function.
 640 This gives them access to the data structures and some other
 641 infrastructure, but also allows elements to accept additional
 642 arguments (e.g frequency, offset, ...).
 643
 644     template<typename Element, typename... Args>
 645     void ModularSimulatorAlgorithmBuilder::add(Args&&... args)
 646     {
 647         // Get element from factory method
 648         auto* element = static_cast<Element*>(getElementPointer<Element>(
 649                 legacySimulatorData_, &elementAdditionHelper_, statePropagatorData_.get(),
 650                 energyData_.get(), freeEnergyPerturbationData_.get(), &globalCommunicationHelper_,
 651                 std::forward<Args>(args)...));
 652
 653         // Make sure returned element pointer is owned by *this
 654         // Ensuring this makes sure we can control the life time
 655         if (!elementExists(element))
 656         {
 657             throw ElementNotFoundError("Tried to append non-existing element to call list.");
 658         }
 659
 660         // Register element with infrastructure
 661     }
 662
 663 Note that `getElementPointer<Element>` will call `Element::getElementPointerImpl`,
 664 which needs to be implemented by the different elements.
 665
 666 ## Infrastructure
 667 ### `DomDecHelper` and `PmeLoadBalanceHelper`
 668 These infrastructure elements are responsible for domain decomposition and
 669 PME load balancing, respectively. They encapsulate function calls which are
 670 important for performance, but outside the scope of this effort. They rely
 671 on legacy data structures for the state (both) and the topology (domdec).
 672
 673 The elements do not implement the ISimulatorElement interface, as
 674 the Simulator is calling them explicitly between task queue population
 675 steps. This allows elements to receive the new topology / state before
 676 deciding what functionality they need to run.
 677
 678 ### Checkpointing
 679 The `CheckpointHelper` is responsible to write checkpoints, and to offer
 680 its clients access to the data read from checkpoint.
 681
 682 Writing checkpoints is done just before neighbor-searching (NS) steps,
 683 or before the last step. Checkpointing occurs periodically (by default,
 684 every 15 minutes), and needs two NS steps to take effect - on the first
 685 NS step, the checkpoint helper on master rank signals to all other ranks
 686 that checkpointing is about to occur. At the next NS step, the checkpoint
 687 is written. On the last step, checkpointing happens immediately before the
 688 step (no signalling). To be able to react to last step being signalled,
 689 the CheckpointHelper also implements the `ISimulatorElement` interface,
 690 but only registers a function if the last step has been called.
 691
 692 Checkpointing happens at the top of a simulation step, which gives a
 693 straightforward re-entry point at the top of the simulator loop.
 694
 695 #### Implementation details
 696 ##### Other (older) approaches
 697 **Legacy checkpointing approach:** All data to be checkpointed needs to be
 698 stored in one of the following data structures:
 699 * `t_state`, which also holds pointers to
 700   - `history_t` (history for restraints)
 701   - `df_history_t` (history for free energy)
 702   - `ekinstate`
 703   - `AwhHistory`
 704 * `ObservableHistory`, consisting of
 705   - `energyhistory_t`
 706   - `PullHistory`
 707   - `edsamhistory_t`
 708   - `swaphistory_t`
 709 * Checkpoint further saves details about the output files being used
 710
 711 These data structures are then serialized by a function having knowledge of
 712 their implementation details. One possibility to add data to the checkpoint
 713 is to expand one of the objects that is currently being checkpointed, and
 714 edit the respective `do_cpt_XXX` function in `checkpoint.cpp` which interacts
 715 with the XDR library. The alternative would be to write an entirely new data
 716 structure, changing the function signature of all checkpoint-related functions,
 717 and write a corresponding low-level routine interacting with the XDR library.
 718
 719 **The MdModule approach:** To allow for modules to write checkpoints, the legacy
 720 checkpoint was extended by a KVTree. When writing to checkpoint, this tree gets
 721 filled (via callbacks) by the single modules, and then serialized. When reading,
 722 the KVTree gets deserialized, and then distributed to the modules which can read
 723 back the data previously stored.
 724
 725 ##### Modular simulator design
 726
 727 The MdModule checks off almost all requirements to a modularized checkpointing format.
 728 The proposed design is therefore an evolved form of this approach. Notably, two
 729 improvements include
 730 * Hide the implementation details of the data structure holding the data (currently,
 731   a KV-Tree) from the clients. This allows to change the implementation details of
 732   reading / writing checkpoints without touching client code.
 733 * Offer a unified way to read and write to data, allowing clients to write one
 734   (templated) function to read to and write from checkpoint. This allows to
 735   eliminate code duplication and the danger of having read and write functions
 736   getting out of sync.
 737
 738 The modular simulator checkpointing does not currently change the way that the
 739 legacy simulator is checkpointing. Some data structures involved in the legacy
 740 checkpoint did, however, get an implementation of the new approach. This is
 741 needed for ModularSimulator checkpointing, but also gives a glimpse of how
 742 implementing this for legacy data structures would look like.
 743
 744 The most important design part is the `CheckpointData` class. It exposes methods
 745 to read and write scalar values, ArrayRefs, and tensors. It also allows to create
 746 a "sub-object" of the same type `CheckpointData` which allows to have more complex
 747 members implement their own checkpointing routines (without having to be aware that
 748 they are a member). All methods are templated on the chosen operation,
 749 `CheckpointDataOperation::Read` or `CheckpointDataOperation::Write`, allowing clients
 750 to use the same code to read and write to checkpoint. Type traits and constness are
 751 used to catch as many errors as possible at compile time. `CheckpointData` uses a
 752 KV-tree to store the data internally. This is however never exposed to the client.
 753 Having this abstraction layer gives freedom to change the internal implementation
 754 in the future.
 755
 756 All `CheckpointData` objects are owned by a `ReadCheckpointDataHolder` or
 757 `WriteCheckpointDataHolder`. These holder classes own the internal KV-tree, and offer
 758 `deserialize(ISerializer*)` and `serialize(ISerializer*)` functions, respectively,
 759 which allow to read from / write to file. This separation clearly defines ownership
 760 and separates the interface aimed at file IO from the interface aimed at objects
 761 reading / writing checkpoints.
 762
 763 Checkpointing for modular simulator is tied in the general checkpoint facility by
 764 passing a `ReadCheckpointDataHolder` or `WriteCheckpointDataHolder` object to the
 765 legacy checkpoint read and write operations.
 766
 767 ##### Notes about the modular simulator checkpointing design
 768
 769 **Distinction of data between clients:** The design requires that separate
 770 clients have independent sub-`CheckpointData` objects identified by a unique key.
 771 This key is the only thing that needs to be unique between clients, i.e. clients are
 772 free to use any key _within_ their sub-`CheckpointData` without danger to overwrite
 773 data used by other clients.
 774
 775 **Versioning:** The design relies on clients keeping their own versioning system
 776 within their sub-`CheckpointData` object. As the data stored by clients is opaque
 777 to the top level checkpointing facility, it has no way to know when the internals
 778 change. Only fundamental changes to the checkpointing architecture can still be
 779 tracked by a top-level checkpoint version.
 780
 781 **Key existence:** The currently uploaded design does not allow to check whether
 782 a key is present in `CheckpointData`. This could be introduced if needed - however,
 783 if clients write self-consistent read and write code, this should never be needed.
 784 Checking for key existence seems rather to be a lazy way to circumvent versioning,
 785 and is therefore discouraged.
 786
 787 **Callback method:** The modular simulator and MdModules don't use the exact same
 788 way of communicating with clients. The two methods could be unified if needed.
 789 The only _fundamental_ difference is that modular simulator clients need to identify
 790 with a unique key to receive their dedicated sub-data, while MdModules all read from
 791 and write to the same KV-tree. MdModules could be adapted to that by either requiring
 792 a unique key from the modules, or by using the same `CheckpointData` for all modules
 793 and using a single unique key (something like "MdModules") to register that object
 794 with the global checkpoint.
 795
 796 **Performance:** One of the major differences between the new approach and the legacy
 797 checkpointing is that data gets _copied_ into `CheckpointData`, while the legacy
 798 approach only took a pointer to the data and serialized it. This slightly reduces
 799 performance. Some thoughts on that:
 800 * By default, checkpointing happens at the start of the simulation (only if reading
 801 from checkpoint), every 15 minutes during simulation runs, and at the end of the
 802 simulation run. This makes it a low priority target for optimization. Consequently,
 803 not much thoughts have been put in the optimization, but there's certainly some way
 804 to improve things here and there if we consider it necessary.
 805 * The copying will only have measurable effect when large data gets checkpointed -
 806 likely only for saving the positions / velocities of the entire state, so that
 807 should be the first target for optimization if needed.
 808 * Copying data could have advantages moving forward - we could continue the
 809 simulation as soon as the data is written to the `CheckpointData` object, and don't
 810 necessarily need to wait for writing to the physical medium to happen. It also
 811 simplifies moving the point at which checkpointing is performed within the
 812 integrator. One could envision clients storing their data any time during the
 813 integration step, and serializing the resulting `CheckpointData` after the step.
 814 This avoids the need to find a single point within the loop at which all clients
 815 need to be in a state suitable for checkpointing.
 816 * If, however, we wanted to use the same approach for other, more frequent
 817 (and hence more perfomance critical) operations such as saving/restoring states
 818 for MC type algorithms or swapping of states between running simulations in
 819 multi-sim type settings, performance would become more of an issue.
 820
 821 **ISerializer vs KV-tree:** The new approach uses a KV tree internally. The
 822 KV tree is well suited to represent the design philosophy of the approach:
 823 Checkpointing offers a functionality which allows clients to write/read any data
 824 they want. This data remains opaque to the checkpointing element. Clients can
 825 read or write in any order, and in the future, maybe even simultaneously. Data
 826 written by any element should be protected from access from other elements to
 827 avoid bugs. The downside of the KV tree is that all data gets copied before
 828 getting written to file (see above).
 829
 830 Replacing a KV tree by a single ISerializer object which gets passed to clients
 831 would require elements to read and write sequentially in a prescribed order. With
 832 the help of InMemorySerializer, a KV-Tree could likely be emulated (with sub-objects
 833 that serialize to memory, and then a parent object that serializes this memory to
 834 file), but that doesn't present a clear advantage anymore.
 835
 836 ### `TopologyHolder`
 837 The topology object owns the local topology and holds a constant reference
 838 to the global topology owned by the ISimulator.
 839
 840 The local topology is only infrequently changed if domain decomposition is
 841 on, and never otherwise. The topology holder therefore offers elements to register
 842 as ITopologyHolderClients. If they do so, they get a handle to the updated local
 843 topology whenever it is changed, and can rely that their handle is valid
 844 until the next update. The domain decomposition element is defined as friend
 845 class to be able to update the local topology when needed.