1 Design goals and motivation for the data format of bonded forces in NB-LIB
2 --------------------------------------------------------------------------
5 The current format for listed forces in GROMACS looks like this:
9 struct InteractionDefinitions
11 std::vector<t_iparams> iparams;
12 std::array<std::vector<int>, F_NRE> il;
15 The format covers all interaction types, i.e. \ ``t_iparams`` is a union
16 type which can hold the parameters of any type.
17 The other member called ``il`` contains the
18 indices for each interaction type, where ``F_NRE`` is the number of
19 interaction types that GROMACS supports. More precisely, each
20 member of ``il``, a ``std::vector<int>``, is a flattened list of all
21 interactions for a given interaction type. The vector contains ``N+1`` integer indices
22 for each interaction, where ``N`` is the number of particles that are
23 involved in the interaction. An additional index is needed to retrieve
24 the correct parameters in ``iparams``, hence the total number of indices sums up
25 to ``N+1`` per interaction.
27 The big advantage of storing all types in a union data type is (was),
28 that it allows looping over all types with a simple for-loop.
29 In pre C++11 and perhaps even pre C++14 times, looping over different
30 types was a big hassle and the union data type approach likely was the
31 only practicable solution. One downside of this approach, however, is
32 that with just a single (union) type, one can't leverage the compiler's
33 type system, most importantly static branching, for example with overload resolution.
34 As a consequence, only dynamic branching with ``if`` statements remains.
36 Consider, for instance, the implementation of the top-level
37 ``calc_listed(const InteractionDefinitions& idef, ...)`` in GROMACS, which in its essence,
42 void calc_listed(const InteractionDefinitions& idef, ...)
44 // manage timing and multi-threading
46 for (int ftype = 0; ftype < F_NRE; ++type)
48 // branch out and descend stack for 2 intermediate functions based on
49 // the type of interaction that ftype corresponds to
50 // then call a function from a pointer table
52 bondFunction* bonded = bondedInteractionFunctions[ftype];
54 // compute all forces for ftype
55 bonded(idef.iparams, idef.il[ftype], ...);
58 // reduce thread output
61 GROMACS supports a lot of different listed interaction types, such as different
62 types of bonds, angles and proper and improper dihedrals. These different types
63 require different handling and finally the right force kernel chosen from a table
65 The handling code required to correctly branch out to all the different cases
66 results in quite a deep call stack, a lot of branching logic and ends up accounting
67 for a fair part of the overall complexity, which should ideally just consist of
68 the type-specific force calculation implementations.
71 A type-aware approach to listed forces
72 --------------------------------------
74 NB-LIB aims to reduce the overall code complexity with a type-aware data format
75 where each interaction type is implemented as a separate (C++)-type.
76 The format for a given interaction type looks like this:
80 template <class Interaction>
81 struct InteractionData
83 std::vector<Index<Interaction>> indices;
84 std::vector<Interaction> parameters;
87 For each type of interaction, we store the interaction indices plus the
88 interaction parameters. While the (C++)-types are different, the actual data stored is
89 exactly the same: ``N+1`` integer indices per ``N``-center interaction plus the unique parameters.
90 An example for ``Interaction`` would be ``HarmonicBond``, the public part of which looks like this:
97 // return lvalue ref for use with std::tie
98 // in order to leverage std::tuple comparison ops
99 const real& forceConstant();
100 const real& equilDistance();
103 The ``Index`` traits class deduces to ``std::array<int, 3>``, because
104 for each harmonic bond, we need two ``int``\ s for the coordinate
105 indices and a third ``int`` to look up the bond parameters in the
106 ``parameters`` vector. For angles and dihedrals, the ``Index`` trait
107 would add an additional one or two ``int``\ s to hold the additional
110 Finally, we gather all types of interactions in a
111 ``std::tuple``, such that the complete definition for listed forces
112 in NB-LIB looks like this:
116 using ListedInteractions = std::tuple<InteractionData<HarmonicBond>, ..., InteractionData<HarmonicAngle>, ...>;
118 One important property of ``ListedInteractions`` is that it stores exactly the same information as ``InteractionDefinitions``
119 and therefore conversion in either direction is easy to implement.
122 The NB-LIB listed forces pipeline
123 ---------------------------------
125 Given the listed interaction data provided in the format described above,
126 the steps required to calculate the corresponding forces
129 * Loop over all interaction types
130 * Loop over all interactions for given type
131 * Call interaction type kernel, store forces and return energy
134 This procedure is identical to the current implementation in GROMACS.
135 In actual code, the first step looks like this:
139 template<class Buffer, class Pbc>
140 auto reduceListedForces(const ListedInteractions& interactions,
141 const std::vector<gmx::RVec>& x,
145 std::array<real, std::tuple_size<ListedInteractions>::value> energies;
147 // lambda function, will be applied to each type
148 auto computeForceType = [forces, &x, &energies, &pbc](const auto& ielem) {
149 real energy = computeForces(ielem.indices, ielem.parameters, x, forces, pbc);
150 energies[FindIndex<std::decay_t<decltype(ilem)>, ListedInteractions>{}] = energy;
153 // apply the lambda to all bond types
154 for_each_tuple(computeForceType, interactions);
159 With the help of a generic lambda and C++17’s ``std::apply`` in the
160 one-liner ``for_each_tuple``, we can generate the loop over the
161 different types in the tuple quite effortlessly. While
162 ``reduceListedForces`` implements a loop over the interaction types, the
163 next layer, ``computeForces`` implements a loop over all interactions of
168 template <class Index, class InteractionType, class Buffer, class Pbc>
169 real computeForces(const std::vector<Index>& indices,
170 const std::vector<InteractionType>& iParams,
171 const std::vector<gmx::RVec>& x,
177 for (const auto& index : indices)
179 Epot += dispatchInteraction(index, iParams, x, forces);
185 Compared to the union data type approach where this loop has been manually
186 implemented for all interaction types, in NB-LIB, only a single implementation
189 We’re now down to the level of individual bonds, angles and dihedrals.
190 At this point, the next steps depend on the actual type of the
191 interaction. But instead of dispatching each harmonic bond, cubic bond,
192 harmonic angle and so on to their seperate paths just yet, we just
193 differentiate based on the number of interaction centers for now.
194 Through overload resolution, the appropriate version
195 ``dispatchInteraction`` gets called now, such as this one for the case
196 of 2-center interactions:
200 template <class Buffer, class TwoCenterType, class Pbc>
201 std::enable_if_t<IsTwoCenter<TwoCenterType>::value, real>
202 dispatchInteraction(const InteractionIndex<TwoCenterType>& index,
203 const std::vector<TwoCenterType>& bondInstances,
204 const std::vector<gmx::RVec>& x,
208 int i = std::get<0>(index);
209 int j = std::get<1>(index);
210 const gmx::RVec& x1 = x[i];
211 const gmx::RVec& x2 = x[j];
212 const TwoCenterType& bond = bondInstances[std::get<2>(index)];
215 // calculate x1 - x2 modulo pbc
216 pbc.dxAiuc(x1, x2, dx);
217 real dr2 = dot(dx, dx);
218 real dr = std::sqrt(dr2);
220 auto [force, energy] = bondKernel(dr, bond);
222 // avoid division by 0
226 detail::spreadTwoCenterForces(force, dx, &(*forces)[i], &(*forces)[j]);
232 We can again observe that common parts among different 2-center interaction types
233 are reused. The common parts are
235 * coordinate retrieval
236 * computation of the scalar distance
237 * spreading of the scalar part of the force to the two centers
239 The only remaining thing to do now is to call the actual
240 kernel to compute the force. Since ``bond`` has a distinct type, we can
241 again use overload resolution:
246 auto bondKernel(T dr, const HarmonicBond& bond)
248 return harmonicScalarForce(bond.forceConstant(), bond.equilDistance(), dr);
251 and call the actual kernel, which in its simplest form for a harmonic
252 bond looks like this:
257 std::tuple<T, T> harmonicScalarForce(T k, T x0, T x)
262 real force = -k * dx;
263 real epot = 0.5 * k * dx2;
265 return std::make_tuple(force, epot);
267 /* That was 6 flops */
270 That’s it! The approach outlined here manages to reuse (between different types)
271 a significant part of the code that feeds input data to force kernels.
272 Notably, not a single ``if(ftype)`` is required to implement the control flow.
273 The remaining parts for a feature complete implementation are
274 overloads of ``dispatchInteraction`` for the 3- to 5-center interactions and
275 the type-aware wrappers for all the different kernels implemented in
276 GROMACS. They have been omitted for brevity.
278 A note on **multithreading**: multithreading is handled above the top-level
279 ``reduceListedForces`` described here. For parallel execution, the
280 input ``ListedInteractions`` tuple is split into ``nThreads`` parts and a
281 ``Buffer`` object is set up for each thread. ``reduceListedForces`` is then
282 called once by each thread with the assigned fraction of ``ListedInteractions``
283 and the ``Buffer`` as argument.
284 The lifetime of the ``ListedInteractions`` splits is coupled to the domain decomposition.
289 NB-LIB listed forces employs a (C++)-type aware data format that
290 is otherwise equivalent to its counter-part in GROMACS.
291 The type-aware data format is then used to simplify the "routing" layer that
292 connects data input to the appropriate kernels. Thanks to static branching and polymorphism,
293 increased code reuse and simplified branching logic could be achieved.
294 **The force kernels themselves do not need to be changed and NB-LIB refers to
295 GROMACS for their implementation.**
301 The data flow management for listed forces described here allows further
302 improvements to be implemented:
304 * Aggregate interaction types: fuse interactions of different types into
305 aggregated types. For example, a dihedral interaction and the bonds and angles
306 that are present among the same four particle indices can be combined into a single
307 aggregated interaction. This allows to reuse the particle coordinates loaded from memory
308 for multiple types and also combines the store operations for the forces.
309 Type aggregates also likely simplify an efficient GPU implementation of listed forces.
311 * Separation of a topology containing both parameter sets for a system state A and B into two
312 separate topologies for the A and B system states.