docs/nblib/listed-data-format.rst

   1 Design goals and motivation for the data format of bonded forces in NB-LIB
   2 --------------------------------------------------------------------------
   3
   4
   5 The current format for listed forces in GROMACS looks like this:
   6
   7 .. code:: cpp
   8
   9    struct InteractionDefinitions
  10    {
  11        std::vector<t_iparams> iparams;
  12        std::array<std::vector<int>, F_NRE> il;
  13    };
  14
  15 The format covers all interaction types, i.e. \ ``t_iparams`` is a union
  16 type which can hold the parameters of any type.
  17 The other member called ``il`` contains the
  18 indices for each interaction type, where ``F_NRE`` is the number of
  19 interaction types that GROMACS supports. More precisely, each
  20 member of ``il``, a ``std::vector<int>``, is a flattened list of all
  21 interactions for a given interaction type. The vector contains ``N+1`` integer indices
  22 for each interaction, where ``N`` is the number of particles that are
  23 involved in the interaction. An additional index is needed to retrieve
  24 the correct parameters in ``iparams``, hence the total number of indices sums up
  25 to ``N+1`` per interaction.
  26
  27 The big advantage of storing all types in a union data type is (was),
  28 that it allows looping over all types with a simple for-loop.
  29 In pre C++11 and perhaps even pre C++14 times, looping over different
  30 types was a big hassle and the union data type approach likely was the
  31 only practicable solution. One downside of this approach, however, is
  32 that with just a single (union) type, one can't leverage the compiler's
  33 type system, most importantly static branching, for example with overload resolution.
  34 As a consequence, only dynamic branching with ``if`` statements remains.
  35
  36 Consider, for instance, the implementation of the top-level
  37 ``calc_listed(const InteractionDefinitions& idef, ...)`` in GROMACS, which in its essence,
  38 looks like this:
  39
  40 .. code:: cpp
  41
  42    void calc_listed(const InteractionDefinitions& idef, ...)
  43    {
  44        // manage timing and multi-threading
  45
  46        for (int ftype = 0; ftype < F_NRE; ++type)
  47        {
  48            // branch out and descend stack for 2 intermediate functions based on
  49            // the type of interaction that ftype corresponds to
  50            // then call a function from a pointer table
  51
  52            bondFunction* bonded = bondedInteractionFunctions[ftype];
  53
  54            // compute all forces for ftype
  55            bonded(idef.iparams, idef.il[ftype], ...);
  56        }
  57
  58        // reduce thread output
  59    }
  60
  61 GROMACS supports a lot of different listed interaction types, such as different
  62 types of bonds, angles and proper and improper dihedrals. These different types
  63 require different handling and finally the right force kernel chosen from a table
  64 of function pointers.
  65 The handling code required to correctly branch out to all the different cases
  66 results in quite a deep call stack, a lot of branching logic and ends up accounting
  67 for a fair part of the overall complexity, which should ideally just consist of
  68 the type-specific force calculation implementations.
  69
  70
  71 A type-aware approach to listed forces
  72 --------------------------------------
  73
  74 NB-LIB aims to reduce the overall code complexity with a type-aware data format
  75 where each interaction type is implemented as a separate (C++)-type.
  76 The format for a given interaction type looks like this:
  77
  78 .. code:: cpp
  79
  80    template <class Interaction>
  81    struct InteractionData
  82    {
  83        std::vector<Index<Interaction>> indices;
  84        std::vector<Interaction>        parameters;
  85    };
  86
  87 For each type of interaction, we store the interaction indices plus the
  88 interaction parameters. While the (C++)-types are different, the actual data stored is
  89 exactly the same: ``N+1`` integer indices per ``N``-center interaction plus the unique parameters.
  90 An example for ``Interaction`` would be ``HarmonicBond``, the public part of which looks like this:
  91
  92 .. code:: cpp
  93
  94    class HarmonicBond
  95    {
  96    public:
  97        // return lvalue ref for use with std::tie
  98        // in order to leverage std::tuple comparison ops
  99        const real& forceConstant();
 100        const real& equilDistance();
 101    };
 102
 103 The ``Index`` traits class deduces to ``std::array<int, 3>``, because
 104 for each harmonic bond, we need two ``int``\ s for the coordinate
 105 indices and a third ``int`` to look up the bond parameters in the
 106 ``parameters`` vector. For angles and dihedrals, the ``Index`` trait
 107 would add an additional one or two ``int``\ s to hold the additional
 108 coordinate indices.
 109
 110 Finally, we gather all types of interactions in a
 111 ``std::tuple``, such that the complete definition for listed forces
 112 in NB-LIB looks like this:
 113
 114 .. code:: cpp
 115
 116    using ListedInteractions = std::tuple<InteractionData<HarmonicBond>, ..., InteractionData<HarmonicAngle>, ...>;
 117
 118 One important property of ``ListedInteractions`` is that it stores exactly the same information as ``InteractionDefinitions``
 119 and therefore conversion in either direction is easy to implement.
 120
 121
 122 The NB-LIB listed forces pipeline
 123 ---------------------------------
 124
 125 Given the listed interaction data provided in the format described above,
 126 the steps required to calculate the corresponding forces
 127 are, in brief:
 128
 129   * Loop over all interaction types
 130   * Loop over all interactions for given type
 131   * Call interaction type kernel, store forces and return energy
 132
 133
 134 This procedure is identical to the current implementation in GROMACS.
 135 In actual code, the first step looks like this:
 136
 137 .. code:: cpp
 138
 139    template<class Buffer, class Pbc>
 140    auto reduceListedForces(const ListedInteractions& interactions,
 141                            const std::vector<gmx::RVec>& x,
 142                            Buffer* forces,
 143                            const Pbc& pbc)
 144    {
 145        std::array<real, std::tuple_size<ListedInteractions>::value> energies;
 146
 147        // lambda function, will be applied to each type
 148        auto computeForceType = [forces, &x, &energies, &pbc](const auto& ielem) {
 149            real energy = computeForces(ielem.indices, ielem.parameters, x, forces, pbc);
 150            energies[FindIndex<std::decay_t<decltype(ilem)>, ListedInteractions>{}] = energy;
 151        };
 152
 153        // apply the lambda to all bond types
 154        for_each_tuple(computeForceType, interactions);
 155
 156        return energies;
 157    }
 158
 159 With the help of a generic lambda and C++17’s ``std::apply`` in the
 160 one-liner ``for_each_tuple``, we can generate the loop over the
 161 different types in the tuple quite effortlessly. While
 162 ``reduceListedForces`` implements a loop over the interaction types, the
 163 next layer, ``computeForces`` implements a loop over all interactions of
 164 a given type:
 165
 166 .. code:: cpp
 167
 168    template <class Index, class InteractionType, class Buffer, class Pbc>
 169    real computeForces(const std::vector<Index>& indices,
 170                       const std::vector<InteractionType>& iParams,
 171                       const std::vector<gmx::RVec>& x,
 172                       Buffer* forces,
 173                       const Pbc& pbc)
 174    {
 175        real Epot = 0.0;
 176
 177        for (const auto& index : indices)
 178        {
 179            Epot += dispatchInteraction(index, iParams, x, forces);
 180        }
 181
 182        return Epot;
 183    }
 184
 185 Compared to the union data type approach where this loop has been manually
 186 implemented for all interaction types, in NB-LIB, only a single implementation
 187 is required.
 188
 189 We’re now down to the level of individual bonds, angles and dihedrals.
 190 At this point, the next steps depend on the actual type of the
 191 interaction. But instead of dispatching each harmonic bond, cubic bond,
 192 harmonic angle and so on to their seperate paths just yet, we just
 193 differentiate based on the number of interaction centers for now.
 194 Through overload resolution, the appropriate version
 195 ``dispatchInteraction`` gets called now, such as this one for the case
 196 of 2-center interactions:
 197
 198 .. code:: cpp
 199
 200    template <class Buffer, class TwoCenterType, class Pbc>
 201    std::enable_if_t<IsTwoCenter<TwoCenterType>::value, real>
 202    dispatchInteraction(const InteractionIndex<TwoCenterType>& index,
 203                        const std::vector<TwoCenterType>& bondInstances,
 204                        const std::vector<gmx::RVec>& x,
 205                        Buffer* forces,
 206                        const Pbc& pbc)
 207    {
 208        int i = std::get<0>(index);
 209        int j = std::get<1>(index);
 210        const gmx::RVec& x1 = x[i];
 211        const gmx::RVec& x2 = x[j];
 212        const TwoCenterType& bond = bondInstances[std::get<2>(index)];
 213
 214        gmx::RVec dx;
 215        // calculate x1 - x2 modulo pbc
 216        pbc.dxAiuc(x1, x2, dx);
 217        real dr2 = dot(dx, dx);
 218        real dr  = std::sqrt(dr2);
 219
 220        auto [force, energy] = bondKernel(dr, bond);
 221
 222        // avoid division by 0
 223        if (dr2 != 0.0)
 224        {
 225            force /= dr;
 226            detail::spreadTwoCenterForces(force, dx, &(*forces)[i], &(*forces)[j]);
 227        }
 228
 229        return energy;
 230    }
 231
 232 We can again observe that common parts among different 2-center interaction types
 233 are reused. The common parts are
 234
 235  * coordinate retrieval
 236  * computation of the scalar distance
 237  * spreading of the scalar part of the force to the two centers
 238
 239 The only remaining thing to do now is to call the actual
 240 kernel to compute the force. Since ``bond`` has a distinct type, we can
 241 again use overload resolution:
 242
 243 .. code:: cpp
 244
 245    template <class T>
 246    auto bondKernel(T dr, const HarmonicBond& bond)
 247    {
 248        return harmonicScalarForce(bond.forceConstant(), bond.equilDistance(), dr);
 249    }
 250
 251 and call the actual kernel, which in its simplest form for a harmonic
 252 bond looks like this:
 253
 254 .. code:: cpp
 255
 256    template <class T>
 257    std::tuple<T, T> harmonicScalarForce(T k, T x0, T x)
 258    {
 259        real dx  = x - x0;
 260        real dx2 = dx * dx;
 261
 262        real force = -k * dx;
 263        real epot = 0.5 * k * dx2;
 264
 265        return std::make_tuple(force, epot);
 266
 267        /* That was 6 flops */
 268    }
 269
 270 That’s it! The approach outlined here manages to reuse (between different types)
 271 a significant part of the code that feeds input data to force kernels.
 272 Notably, not a single ``if(ftype)`` is required to implement the control flow.
 273 The remaining parts for a feature complete implementation are
 274 overloads of ``dispatchInteraction`` for the 3- to 5-center interactions and
 275 the type-aware wrappers for all the different kernels implemented in
 276 GROMACS. They have been omitted for brevity.
 277
 278 A note on **multithreading**: multithreading is handled above the top-level
 279 ``reduceListedForces`` described here. For parallel execution, the
 280 input ``ListedInteractions`` tuple is split into ``nThreads`` parts and a
 281 ``Buffer`` object is set up for each thread. ``reduceListedForces`` is then
 282 called once by each thread with the assigned fraction of ``ListedInteractions``
 283 and the ``Buffer`` as argument.
 284 The lifetime of the ``ListedInteractions`` splits is coupled to the domain decomposition.
 285
 286 Summary
 287 -------
 288
 289 NB-LIB listed forces employs a (C++)-type aware data format that
 290 is otherwise equivalent to its counter-part in GROMACS.
 291 The type-aware data format is then used to simplify the "routing" layer that
 292 connects data input to the appropriate kernels. Thanks to static branching and polymorphism,
 293 increased code reuse and simplified branching logic could be achieved.
 294 **The force kernels themselves do not need to be changed and NB-LIB refers to
 295 GROMACS for their implementation.**
 296
 297
 298 Outlook
 299 -------
 300
 301 The data flow management for listed forces described here allows further
 302 improvements to be implemented:
 303
 304 * Aggregate interaction types: fuse interactions of different types into
 305   aggregated types. For example, a dihedral interaction and the bonds and angles
 306   that are present among the same four particle indices can be combined into a single
 307   aggregated interaction. This allows to reuse the particle coordinates loaded from memory
 308   for multiple types and also combines the store operations for the forces.
 309   Type aggregates also likely simplify an efficient GPU implementation of listed forces.
 310
 311 * Separation of a topology containing both parameter sets for a system state A and B into two
 312   separate topologies for the A and B system states.