docs/reference-manual/special/awh.rst

   1 .. _awh:
   2
   3 Adaptive biasing with AWH
   4 -------------------------
   5
   6 The accelerated weight histogram method
   7 :ref:`137 <reflindahl2014accelerated>` calculates the PMF along a reaction coordinate by adding
   8 an adaptively determined biasing potential. AWH flattens free energy
   9 barriers along the reaction coordinate by applying a history-dependent
  10 potential to the system that “fills up” free energy minima. This is
  11 similar in spirit to other adaptive biasing potential methods, e.g. the
  12 Wang-Landau \ :ref:`138 <refwang2001efficient>`, local
  13 elevation \ :ref:`139 <refhuber1994local>` and
  14 metadynamics \ :ref:`140 <reflaio2002escaping>` methods.
  15 The initial sampling stage of AWH makes the method robust against the
  16 choice of input parameters. Furthermore, the target distribution along
  17 the reaction coordinate may be chosen freely.
  18
  19 Basics of the method
  20 ^^^^^^^^^^^^^^^^^^^^
  21
  22 Rather than biasing the reaction coordinate :math:`\xi(x)` directly, AWH
  23 acts on a *reference coordinate* :math:`\lambda`. The reaction
  24 coordinate :math:`\xi(x)` is coupled to :math:`\lambda` with a harmonic
  25 potential
  26
  27 .. math:: Q(\xi,\lambda) = \frac{1}{2} \beta k (\xi - \lambda)^2,
  28           :label: eqnawhbasic
  29
  30 so that for large force constants :math:`k`,
  31 :math:`\xi \approx \lambda`. Note the use of dimensionless energies for
  32 compatibility with previously published work. Units of energy are
  33 obtained by multiplication with :math:`k_BT=1/\beta`. In the simulation,
  34 :math:`\lambda` samples the user-defined sampling interval :math:`I`.
  35 For a multidimensional reaction coordinate :math:`\xi`, the sampling
  36 interval is the Cartesian product :math:`I=\Pi_{\mu} I_{\mu}` (a rectangular
  37 domain). The connection between atom coordinates and :math:`\lambda` is
  38 established through the extended ensemble \ :ref:`68 <refLyubartsev1992>`,
  39
  40 .. math:: P(x,\lambda) = \frac{1}{\mathcal{Z}}e^{g(\lambda) - Q(\xi(x),\lambda) - V(x)},
  41           :label: eqawhpxlambda
  42
  43 where :math:`g(\lambda)` is a bias function (a free variable) and
  44 :math:`V(x)` is the unbiased potential energy of the system. The
  45 distribution along :math:`\lambda` can be tuned to be any predefined
  46 *target distribution* :math:`\rho(\lambda)` (often chosen to be flat) by
  47 choosing :math:`g(\lambda)` wisely. This is evident from
  48
  49 .. math:: P(\lambda) = \int P(x,\lambda)  dx =
  50           \frac{1}{\mathcal{Z}}e^{g(\lambda)} \int e^{- Q(\xi(x),\lambda) - V(x)}  dx
  51           \equiv \frac{1}{\mathcal{Z}}e^{g(\lambda) - F(\lambda)},
  52           :label: eqawhplambda
  53
  54 where :math:`F(\lambda)` is the free energy
  55
  56 .. math:: F(\lambda) = -\ln \int e^{- Q(\xi(x),\lambda) - V(x)}  dx.
  57           :label: eqawhflambda
  58
  59 Being the convolution of the PMF with the Gaussian defined by the
  60 harmonic potential, :math:`F(\lambda)` is a smoothened version of the
  61 PMF. :eq:`Eq. %s <eqawhplambda>` shows that in order to obtain
  62 :math:`P(\lambda)=\rho(\lambda)`, :math:`F(\lambda)` needs to be
  63 determined accurately. Thus, AWH adaptively calculates
  64 :math:`F(\lambda)` and simultaneously converges :math:`P(\lambda)`
  65 toward :math:`\rho(\lambda)`.
  66
  67 The free energy update
  68 ^^^^^^^^^^^^^^^^^^^^^^
  69
  70 AWH is initialized with an estimate of the free energy
  71 :math:`F_0(\lambda)`. At regular time intervals this estimate is updated
  72 using data collected in between the updates. At update :math:`n`, the
  73 applied bias :math:`g_n(\lambda)` is a function of the current free
  74 energy estimate :math:`F_n(\lambda)` and target distribution
  75 :math:`\rho_n(\lambda)`,
  76
  77 .. math:: g_n(\lambda) = \ln \rho_n(\lambda) +F_n(\lambda),
  78           :label: eqawhgrhofrelation
  79
  80 which is consistent with :eq:`Eq. %s <eqawhplambda>`. Note that also the
  81 target distribution may be updated during the simulation (see examples
  82 in section :ref:`awhtargets`). Substituting this choice of :math:`g=g_n`
  83 back into :eq:`Eq. %s <eqawhplambda>` yields the simple free energy update
  84
  85 .. math:: \Delta F_n(\lambda)
  86           = F(\lambda) - F_n(\lambda)
  87           = -\ln\frac{P_n(\lambda)}{\rho_n(\lambda)},
  88           :label: eqawhdfnaive
  89
  90 which would yield a better estimate :math:`F_{n+1} = F_n + \Delta F_n`,
  91 assuming :math:`P_n(\lambda)` can be measured accurately. AWH estimates
  92 :math:`P_n(\lambda)` by regularly calculating the conditional
  93 distribution
  94
  95 .. math:: \omega_n(\lambda|x) \equiv P_n(\lambda|x) = \frac{e^{g_n(\lambda) - Q(\xi(x), \lambda)}}{\sum_{\lambda'} e^{g_n(\lambda') - Q(\xi(x),\lambda')}}.
  96           :label: eqawhomega
  97
  98 Accumulating these probability weights yields
  99 :math:`\sum_t \omega(\lambda|x(t)) \sim P_n(\lambda)`, where
 100 :math:`\int P_n(\lambda|x) P_n(x) dx = P_n(\lambda)` has been used. The
 101 :math:`\omega_n(\lambda|x)` weights are thus the samples of the AWH
 102 method. With the limited amount of sampling one has in practice, update
 103 scheme :eq:`%s <eqawhdfnaive>` yields very noisy results. AWH instead applies a
 104 free energy update that has the same form but which can be applied
 105 repeatedly with limited and localized sampling,
 106
 107 .. math:: \Delta F_n = -\ln \frac{W_n(\lambda) + \sum_t \omega_n(\lambda|x(t))}{W_n(\lambda) + \sum_t\rho_n(\lambda)) }.
 108           :label: eqnawhsampling
 109
 110 Here :math:`W_n(\lambda)` is the *reference weight histogram*
 111 representing prior sampling. The update for :math:`W(\lambda)`,
 112 disregarding the initial stage (see section :ref:`awhinitialstage`), is
 113
 114 .. math:: W_{n+1}(\lambda) = W_n(\lambda) + \sum_t\rho_n(\lambda).
 115           :label: eqawhwupdate
 116
 117 Thus, the weight histogram equals the targeted, “ideal” history of
 118 samples. There are two important things to note about the free energy
 119 update. First, sampling is driven away from oversampled, currently local
 120 regions. For such :math:`\lambda` values,
 121 :math:`\omega_n(\lambda) > \rho_n(\lambda)` and
 122 :math:`\Delta F_n(\lambda) < 0`, which by :eq:`Eq. %s <eqawhgrhofrelation>`
 123 implies :math:`\Delta g_n(\lambda) < 0` (assuming
 124 :math:`\Delta \rho_n \equiv 0`). Thus, the probability to sample
 125 :math:`\lambda` decreases after the update (see :eq:`Eq. %s <eqawhplambda>`).
 126 Secondly, the normalization of the histogram
 127 :math:`N_n=\sum_\lambda W_n(\lambda)`, determines the update size
 128 :math:`| \Delta F(\lambda) |`. For instance, for a single sample
 129 :math:`\omega(\lambda|x)`, the shape of the update is approximately a
 130 Gaussian function of width :math:`\sigma=1/\sqrt{\beta k}` and height
 131 :math:`\propto 1/N_n` :ref:`137 <reflindahl2014accelerated>`,
 132
 133 .. math:: | \Delta F_n(\lambda) | \propto \frac{1}{N_n} e^{-\frac{1}{2} \beta k (\xi(x) - \lambda)^2}.
 134           :label: eqawhdfsize
 135
 136 Therefore, as samples accumulate in :math:`W(\lambda)` and :math:`N_n`
 137 grows, the updates get smaller, allowing for the free energy to
 138 converge.
 139
 140 Note that quantity of interest to the user is not :math:`F(\lambda)` but
 141 the PMF :math:`\Phi(\xi)`. :math:`\Phi(\xi)` is extracted by reweighting
 142 samples :math:`\xi(t)` on the fly \ :ref:`137 <reflindahl2014accelerated>` (see
 143 also section :ref:`awhreweight`) and will converge at the same rate as
 144 :math:`F(\lambda)`, see :numref:`Fig. %s <fig-awhbiasevolution1>`. The PMF will be
 145 written to output (see section :ref:`awhusage`).
 146
 147 Applying the bias to the system
 148 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 149
 150 The bias potential can be applied to the system in two ways. Either by
 151 applying a harmonic potential centered at :math:`\lambda(t)`, which is
 152 sampled using (rejection-free) Monte-Carlo sampling from the conditional
 153 distribution :math:`\omega_n(\lambda | x(t)) = P_n(\lambda | x(t))`, see
 154 :eq:`Eq. %s <eqawhomega>`. This is also called Gibbs sampling or independence
 155 sampling. Alternatively, and by default in the code, the following
 156 *convolved bias potential* can be applied,
 157
 158 .. math:: U_n(\xi) = -\ln \int e^{ g_n(\lambda) -Q(\xi,\lambda)} d \lambda.
 159           :label: eqawhbiaspotential
 160
 161 These two approaches are equivalent in the sense that they give rise to
 162 the same biased probabilities :math:`P_n(x)`
 163 (cf. :eq:`%s <eqawhpxlambda>`) while the dynamics are clearly
 164 different in the two cases. This choice does not affect the internals of
 165 the AWH algorithm, only what force and potential AWH returns to the MD
 166 engine.
 167
 168 .. _fig-awhbiasevolution1:
 169
 170 .. figure:: plots/awh-traj.*
 171         :width: 8.0cm
 172
 173         AWH evolution in time for a Brownian particle in a double-well
 174         potential. The reaction coordinate :math:`\xi(t)` traverses the sampling
 175         interval multiple times in the initial stage before exiting and entering
 176         the final stage. In the final stage, the dynamics of
 177         :math:`\xi` becomes increasingly diffusive.
 178
 179 .. _fig-awhbiasevolution2:
 180
 181 .. figure:: plots/awh-invN.*
 182         :width: 8.0cm
 183
 184         In the final stage, the dynamics of
 185         :math:`\xi` becomes increasingly diffusive. The times of covering are
 186         shown as :math:`\times`-markers of different colors. At these times the
 187         free energy update size :math:`\sim 1/N`, where :math:`N` is the size of
 188         the weight histogram, is decreased by scaling :math:`N` by a factor of
 189         :math:`\gamma=3`.
 190
 191 .. _fig-awhbiasevolution3:
 192
 193 .. figure:: plots/awh-sampleweights.*
 194         :width: 8.0cm
 195
 196         In the final stage, :math:`N` grows at the
 197         sampling rate and thus :math:`1/N\sim1/t`. The exit from the final stage
 198         is determined on the fly by ensuring that the effective sample weight
 199         :math:`s` of data collected in the final stage exceeds that of initial
 200         stage data (note that :math:`\ln s(t)` is plotted).
 201
 202 .. _fig-awhbiasevolution4:
 203
 204 .. figure:: plots/awh-pmfs.*
 205         :width: 8.0cm
 206
 207         An estimate of the PMF is also extracted from the simulation (bottom
 208         right), which after exiting the initial stage should estimate global
 209         free energy differences fairly accurately.
 210
 211 .. _awhinitialstage:
 212
 213 The initial stage
 214 ~~~~~~~~~~~~~~~~~
 215
 216 Initially, when the bias potential is far from optimal, samples will be
 217 highly correlated. In such cases, letting :math:`W(\lambda)` accumulate
 218 samples as prescribed by :eq:`Eq. %s <eqawhwupdate>`, entails
 219 a too rapid decay of the free energy update size. This motivates
 220 splitting the simulation into an *initial stage* where the weight
 221 histogram grows according to a more restrictive and robust protocol, and
 222 a *final stage* where the weight histogram grows linearly at the
 223 sampling rate (:eq:`Eq. %s <eqawhwupdate>`). The AWH initial
 224 stage takes inspiration from the well-known Wang-Landau algorithm \ :ref:`138 <refwang2001efficient>`,
 225 although there are differences in the details.
 226
 227 In the initial stage the update size is kept constant (by keeping
 228 :math:`N_n` constant) until a transition across the sampling interval
 229 has been detected, a “covering”. For the definition of a covering, see
 230 :eq:`Eq. %s <eqawhcovering>` below. After a covering has
 231 occurred, :math:`N_n` is scaled up by a constant “growth factor”
 232 :math:`\gamma`, chosen heuristically as :math:`\gamma=3`. Thus, in the
 233 initial stage :math:`N_n` is set dynamically as
 234 :math:`N_{n} = \gamma^{m} N_0`, where :math:`m` is the number of
 235 coverings. Since the update size scales as :math:`1/N` (
 236 :eq:`Eq. %s <eqawhdfsize>`) this leads to a close to
 237 exponential decay of the update size in the initial stage, see
 238 :numref:`Fig. %s <fig-awhbiasevolution1>`.
 239
 240 The update size directly determines the rate of change of
 241 :math:`F_n(\lambda)` and hence, from
 242 :eq:`Eq. %s <eqawhgrhofrelation>`, also the rate of change of
 243 the bias funcion :math:`g_n(\lambda)` Thus initially, when :math:`N_n`
 244 is kept small and updates large, the system will be driven along the
 245 reaction coordinate by the constantly fluctuating bias. If :math:`N_0`
 246 is set small enough, the first transition will typically be fast because
 247 of the large update size and will quickly give a first rough estimate of
 248 the free energy. The second transition, using :math:`N_1=\gamma N_0`
 249 refines this estimate further. Thus, rather than very carefully filling
 250 free energy minima using a small initial update size, the sampling
 251 interval is sweeped back-and-forth multiple times, using a wide range of
 252 update sizes, see :numref:`Fig. %s <fig-awhbiasevolution1>`. This
 253 way, the initial stage also makes AWH robust against the choice of
 254 :math:`N_0`.
 255
 256 The covering criterion
 257 ^^^^^^^^^^^^^^^^^^^^^^
 258
 259 In the general case of a multidimensional reaction coordinate
 260 :math:`\lambda=(\lambda_{\mu})`, the sampling interval :math:`I` is
 261 considered covered when all dimensions have been covered. A dimension
 262 :math:`d` is covered if all points :math:`\lambda_{\mu}` in the
 263 one-dimensional sampling interval :math:`I_{\mu}` have been “visited”.
 264 Finally, a point :math:`\lambda_{\mu} \in I_{\mu}` has been visited if there is
 265 at least one point :math:`\lambda^*\in I` with
 266 :math:`\lambda^*_{\mu} = \lambda_{\mu}` that since the last covering has
 267 accumulated probability weight corresponding to the peak of a
 268 multidimensional Gaussian distribution
 269
 270 .. math:: \Delta W(\lambda^*)
 271           \ge w_{\mathrm{peak}}
 272           \equiv \prod_{\mu} \frac{\Delta \lambda_{\mu}}{\sqrt{2\pi}\sigma_k}.
 273           :label: eqawhcovering
 274
 275 Here, :math:`\Delta \lambda_{\mu}` is the point spacing of the discretized
 276 :math:`I_{\mu}` and :math:`\sigma_k=1/\sqrt{\beta k_{\mu}}` (where :math:`k_{\mu}`
 277 is the force constant) is the Gaussian width.
 278
 279 Exit from the initial stage
 280 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 281
 282 For longer times, when major free energy barriers have largely been
 283 flattened by the converging bias potential, the histogram
 284 :math:`W(\lambda)` should grow at the actual sampling rate and the
 285 initial stage needs to be exited \ :ref:`141 <refbelardinelli2007fast>`.
 286 There are multiple reasonable (heuristic) ways of determining when this
 287 transition should take place. One option is to postulate that the number
 288 of samples in the weight histogram :math:`N_n` should never exceed the
 289 actual number of collected samples, and exit the initial stage when this
 290 condition breaks \ :ref:`137 <reflindahl2014accelerated>`. In the initial stage,
 291 :math:`N` grows close to exponentially while the collected number of
 292 samples grows linearly, so an exit will surely occur eventually. Here we
 293 instead apply an exit criterion based on the observation that
 294 “artifically” keeping :math:`N` constant while continuing to collect
 295 samples corresponds to scaling down the relative weight of old samples
 296 relative to new ones. Similarly, the subsequent scaling up of :math:`N`
 297 by a factor :math:`\gamma` corresponds to scaling up the weight of old
 298 data. Briefly, the exit criterion is devised such that the weight of a
 299 sample collected *after* the initial stage is always larger or equal to
 300 the weight of a sample collected *during* the initial stage, see
 301 :numref:`Fig. %s <fig-awhbiasevolution1>`. This is consistent with
 302 scaling down early, noisy data.
 303
 304 The initial stage exit criterion will now be described in detail. We
 305 start out at the beginning of a covering stage, so that :math:`N` has
 306 just been scaled by :math:`\gamma` and is now kept constant. Thus, the
 307 first sample of this stage has the weight :math:`s= 1/\gamma` relative
 308 to the last sample of the previous covering stage. We assume that
 309 :math:`\Delta N` samples are collected and added to :math:`W` for each
 310 update . To keep :math:`N` constant, :math:`W` needs to be scaled down
 311 by a factor :math:`N/(N + \Delta N)` after every update. Equivalently,
 312 this means that new data is scaled up relative to old data by the
 313 inverse factor. Thus, after :math:`\Delta n` updates a new sample has
 314 the relative weight
 315 :math:`s=(1/\gamma) [(N_n + \Delta N)/N_n]^{\Delta n}`. Now assume
 316 covering occurs at this time. To continue to the next covering stage,
 317 :math:`N` should be scaled by :math:`\gamma`, which corresponds to again
 318 multiplying :math:`s` by :math:`1/\gamma`. If at this point
 319 :math:`s \ge \gamma`, then after rescaling :math:`s \ge 1`; i.e. overall
 320 the relative weight of a new sample relative to an old sample is still
 321 growing fast. If on the contrary :math:`s < \gamma`, and this defines
 322 the exit from the initial stage, then the initial stage is over and from
 323 now :math:`N` simply grows at the sampling rate (see
 324 :eq:`Eq. %s <eqawhwupdate>`). To really ensure that
 325 :math:`s\ge 1` holds before exiting, so that samples after the exit have
 326 at least the sample weight of older samples, the last covering stage is
 327 extended by a sufficient number of updates.
 328
 329 .. _awhtargets:
 330
 331 Choice of target distribution
 332 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 333
 334 The target distribution :math:`\rho(\lambda)` is traditionally chosen to
 335 be uniform
 336
 337 .. math:: \rho_{\mathrm{const}}(\lambda) = \mathrm{const.}
 338           :label: eqnawhuniformdist
 339
 340 This choice exactly flattens :math:`F(\lambda)` in user-defined
 341 sampling interval :math:`I`. Generally,
 342 :math:`\rho(\lambda)=0, \lambda\notin I`. In certain cases other choices
 343 may be preferable. For instance, in the multidimensional case the
 344 rectangular sampling interval is likely to contain regions of very high
 345 free energy, e.g. where atoms are clashing. To exclude such regions,
 346 :math:`\rho(\lambda)` can specified by the following function of the
 347 free energy
 348
 349 .. math:: \rho_{\mathrm{cut}}(\lambda) \propto \frac{1}{1+ e^{F(\lambda) - F_{\mathrm{cut}}}},
 350           :label: eqawhrhocut
 351
 352
 353 where :math:`F_{\mathrm{cut}}` is a free energy cutoff (relative to
 354 :math:`\min_\lambda F(\lambda)`). Thus, regions of the sampling interval
 355 where :math:`F(\lambda) > F_{\mathrm{cut}}` will be exponentially
 356 suppressed (in a smooth fashion). Alternatively, very high free energy
 357 regions could be avoided while still flattening more moderate free
 358 energy barriers by targeting a Boltzmann distribution corresponding to
 359 scaling :math:`\beta=1/k_BT` by a factor :math:`0<s_\beta<1`,
 360
 361 .. math:: \rho_{\mathrm{Boltz}}(\lambda) \propto e^{-s_\beta F(\lambda)},
 362           :label: eqawhrhoboltz
 363
 364 The parameter :math:`s_\beta` determines to what degree the free energy
 365 landscape is flattened; the lower :math:`s_\beta`, the flatter. Note
 366 that both :math:`\rho_{\mathrm{cut}}(\lambda)` and
 367 :math:`\rho_{\mathrm{Boltz}}(\lambda)` depend on :math:`F(\lambda)`,
 368 which needs to be substituted by the current best estimate
 369 :math:`F_n(\lambda)`. Thus, the target distribution is also updated
 370 (consistently with :eq:`Eq. %s <eqawhgrhofrelation>`).
 371
 372 There is in fact an alternative approach to obtaining
 373 :math:`\rho_{\mathrm{Boltz}}(\lambda)` as the limiting target
 374 distribution in AWH, which is particular in the way the weight histogram
 375 :math:`W(\lambda)` and the target distribution :math:`\rho` are updated
 376 and coupled to each other. This yields an evolution of the bias
 377 potential which is very similar to that of well-tempered
 378 metadynamics \ :ref:`142 <refbarducci2008well>`,
 379 see \ :ref:`137 <reflindahl2014accelerated>` for details. Because of the popularity and
 380 success of well-tempered metadynamics, this is a special case worth
 381 considering. In this case :math:`\rho` is a function of the reference
 382 weight histogram
 383
 384 .. math:: \rho_{\mathrm{Boltz,loc}}(\lambda) \propto W(\lambda),
 385           :label: eqnawhweighthistogram
 386
 387 and the update of the weight histogram is modified (cf.
 388 :eq:`Eq. %s <eqawhwupdate>`)
 389
 390 .. math:: W_{n+1}(\lambda) =  W_{n}(\lambda) + s_{\beta}\sum_t \omega(\lambda | x(t)).
 391           :label: eqnawhupdateweighthist
 392
 393 Thus, here the weight histogram equals the real history of samples, but
 394 scaled by :math:`s_\beta`. This target distribution is called *local*
 395 Boltzmann since :math:`W` is only modified locally, where sampling has
 396 taken place. We see that when :math:`s_\beta \approx 0` the histogram
 397 essentially does not grow and the size of the free energy update will
 398 stay at a constant value (as in the original formulation of
 399 metadynamics). Thus, the free energy estimate will not converge, but
 400 continue to fluctuate around the correct value. This illustrates the
 401 inherent coupling between the convergence and choice of target
 402 distribution for this special choice of target. Furthermore note that
 403 when using :math:`\rho=\rho_{\mathrm{Boltz,loc}}` there is no initial
 404 stage (section :ref:`awhinitialstage`). The rescaling of the weight
 405 histogram applied in the initial stage is a global operation, which is
 406 incompatible :math:`\rho_{\mathrm{Boltz,loc}}` only depending locally on
 407 the sampling history.
 408
 409 Lastly, the target distribution can be modulated by arbitrary
 410 probability weights
 411
 412 .. math:: \rho(\lambda) = \rho_0(\lambda) w_{\mathrm{user}}(\lambda).
 413           :label: eqnawhpropweigth
 414
 415 where :math:`w_{\mathrm{user}}(\lambda)` is provided by user data and
 416 in principle :math:`\rho_0(\lambda)` can be any of the target
 417 distributions mentioned above.
 418
 419 Multiple independent or sharing biases
 420 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 421
 422 Multiple independent bias potentials may be applied within one
 423 simulation. This only makes sense if the biased coordinates
 424 :math:`\xi^{(1)}`, :math:`\xi^{(2)}`, :math:`\ldots` evolve essentially
 425 independently from one another. A typical example of this would be when
 426 applying an independent bias to each monomer of a protein. Furthermore,
 427 multiple AWH simulations can be launched in parallel, each with a (set
 428 of) indepedendent biases.
 429
 430 If the defined sampling interval is large relative to the diffusion time
 431 of the reaction coordinate, traversing the sampling interval multiple
 432 times as is required by the initial stage
 433 (section :ref:`awhinitialstage`) may take an infeasible mount of
 434 simulation time. In these cases it could be advantageous to parallelize
 435 the work and have a group of multiple “walkers” :math:`\xi^{(i)}(t)`
 436 share a single bias potential. This can be achieved by collecting
 437 samples from all :math:`\xi^{(i)}` of the same sharing group into a
 438 single histogram and update a common free energy estimate. Samples can
 439 be shared between walkers within the simulation and/or between multiple
 440 simulations. However, currently only sharing between simulations is
 441 supported in the code while all biases within a simulation are
 442 independent.
 443
 444 Note that when attempting to shorten the simulation time by using
 445 bias-sharing walkers, care must be taken to ensure the simulations are
 446 still long enough to properly explore and equilibrate all regions of the
 447 sampling interval. To begin, the walkers in a group should be
 448 decorrelated and distributed approximately according to the target
 449 distribution before starting to refine the free energy. This can be
 450 achieved e.g. by “equilibrating” the shared weight histogram before
 451 letting it grow; for instance, :math:`W(\lambda)/N\approx \rho(\lambda)`
 452 with some tolerance.
 453
 454 Furthermore, the “covering” or transition criterion of the initial stage
 455 should to be generalized to detect when the sampling interval has been
 456 collectively traversed. One alternative is to just use the same
 457 criterion as for a single walker (but now with more samples), see
 458 :eq:`Eq. %s <eqawhcovering>`. However, in contrast to the
 459 single walker case this does not ensure that any real transitions across
 460 the sampling interval has taken place; in principle all walkers could be
 461 sampling only very locally and still cover the whole interval. Just as
 462 with a standard umbrella sampling procedure, the free energy may appear
 463 to be converged while in reality simulations sampling closeby
 464 :math:`\lambda` values are sampling disconnected regions of phase space.
 465 A stricter criterion, which helps avoid such issues, is to require that
 466 before a simulation marks a point :math:`\lambda_{\mu}` along dimension
 467 :math:`\mu` as visited, and shares this with the other walkers, also all
 468 points within a certain diameter :math:`D_{\mathrm{cover}}` should have
 469 been visited (i.e. fulfill :eq:`Eq. %s <eqawhcovering>`).
 470 Increasing :math:`D_{\mathrm{cover}}` increases robustness, but may slow
 471 down convergence. For the maximum value of :math:`D_{\mathrm{cover}}`,
 472 equal to the length of the sampling interval, the sampling interval is
 473 considered covered when at least one walker has independently traversed
 474 the sampling interval.
 475
 476 .. _awhreweight:
 477
 478 Reweighting and combining biased data
 479 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 480
 481 Often one may want to, post-simulation, calculate the unbiased PMF
 482 :math:`\Phi(u)` of another variable :math:`u(x)`. :math:`\Phi(u)` can be
 483 estimated using :math:`\xi`-biased data by reweighting (“unbiasing”) the
 484 trajectory using the bias potential :math:`U_{n(t)}`, see
 485 :eq:`Eq. %s <eqawhbiaspotential>`. Essentially, one bins the
 486 biased data along :math:`u` and removes the effect of :math:`U_{n(t)}`
 487 by dividing the weight of samples :math:`u(t)` by
 488 :math:`e^{-U_{n(t)}(\xi(t))}`,
 489
 490 .. math:: \hat{\Phi}(u)  = -\ln
 491           \sum_t 1_u(u(t))e^{U_{n(t)}(\xi(t)} \mathcal{Z}_{n(t)}.
 492           :label: eqawhunbias
 493
 494 Here the indicator function :math:`1_u` denotes the binning procedure:
 495 :math:`1_u(u') = 1` if :math:`u'` falls into the bin labeled by
 496 :math:`u` and :math:`0` otherwise. The normalization factor
 497 :math:`\mathcal{Z}_n = \int e^{-\Phi(\xi) - U_{n}(\xi)}d \xi` is the
 498 partition function of the extended ensemble. As can be seen
 499 :math:`\mathcal{Z}_n` depends on :math:`\Phi(\xi)`, the PMF of the
 500 (biased) reaction coordinate :math:`\xi` (which is calculated and
 501 written to file by the AWH simulation). It is advisable to use only
 502 final stage data in the reweighting procedure due to the rapid change of
 503 the bias potential during the initial stage. If one would include
 504 initial stage data, one should use the sample weights that are inferred
 505 by the repeated rescaling of the histogram in the initial stage, for the
 506 sake of consistency. Initial stage samples would then in any case be
 507 heavily scaled down relative to final stage samples. Note that
 508 :eq:`Eq. %s <eqawhunbias>` can also be used to combine data
 509 from multiple simulations (by adding another sum also over the
 510 trajectory set). Furthermore, when multiple independent AWH biases have
 511 generated a set of PMF estimates :math:`\{\hat{\Phi}^{(i)}(\xi)\}`, a
 512 combined best estimate :math:`\hat{\Phi}(\xi)` can be obtained by
 513 applying self-consistent exponential averaging. More details on this
 514 procedure and a derivation of :eq:`Eq. %s <eqawhunbias>`
 515 (using slightly different notation) can be found in :ref:`143 <reflindahl2017sequence>`.
 516
 517 .. _awhfriction:
 518
 519 The friction metric
 520 ~~~~~~~~~~~~~~~~~~~
 521
 522 During the AWH simulation, the following time-integrated force
 523 correlation function is calculated,
 524
 525 .. math:: \eta_{\mu\nu}(\lambda) =
 526           \beta
 527           \int_0^\infty
 528           \frac{
 529           \left<{\delta \mathcal{F}_{\mu}(x(t),\lambda)
 530           \delta \mathcal{F}_\nu(x(0),\lambda)
 531           \omega(\lambda|x(t)) \omega(\lambda|x(0))}\right>}
 532           {\left<{\omega^2(\lambda | x)}\right>}
 533           dt.
 534           :label: eqawhmetric
 535
 536 Here
 537 :math:`\mathcal F_\mu(x,\lambda) = k_\mu (\xi_\mu(x) - \lambda_\mu)` is
 538 the force along dimension :math:`\mu` from an harmonic potential
 539 centered at :math:`\lambda` and
 540 :math:`\delta \mathcal F_{\mu}(x,\lambda) = \mathcal F_{\mu}(x,\lambda) - \left<{\mathcal F_\mu(x,\lambda)}\right>`
 541 is the deviation of the force. The factors :math:`\omega(\lambda|x(t))`,
 542 see :eq:`Eq %s <eqawhomega>`, reweight the samples.
 543 :math:`\eta_{\mu\nu}(\lambda)` is a friction
 544 tensor \ :ref:`144 <refsivak2012thermodynamic>`. Its matrix elements are inversely proportional to local
 545 diffusion coefficients. A measure of sampling (in)efficiency at each
 546 :math:`\lambda` is given by
 547
 548 .. math:: \eta^{\frac{1}{2}}(\lambda) = \sqrt{\det\eta_{\mu\nu}(\lambda)}.
 549           :label: eqawhsqrtmetric
 550
 551 A large value of :math:`\eta^{\frac{1}{2}}(\lambda)` indicates slow
 552 dynamics and long correlation times, which may require more sampling.
 553
 554 .. _awhusage:
 555
 556 Usage
 557 ~~~~~
 558
 559 AWH stores data in the energy file (:ref:`edr`) with a frequency set by the
 560 user. The data – the PMF, the convolved bias, distributions of the
 561 :math:`\lambda` and :math:`\xi` coordinates, etc. – can be extracted
 562 after the simulation using the :ref:`gmx awh` tool. Furthermore, the trajectory
 563 of the reaction coordinate :math:`\xi(t)` is printed to the pull output
 564 file :math:`{\tt pullx.xvg}`. The log file (:ref:`log`) also contains
 565 information; check for messages starting with “awh”, they will tell you
 566 about covering and potential sampling issues.
 567
 568 Setting the initial update size
 569 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 570
 571 The initial value of the weight histogram size :math:`N` sets the
 572 initial update size (and the rate of change of the bias). When :math:`N`
 573 is kept constant, like in the initial stage, the average variance of the
 574 free energy scales as :math:`\varepsilon^2 \sim 1/(ND)`
 575 :ref:`137 <reflindahl2014accelerated>`, for a simple model system with constant diffusion
 576 :math:`D` along the reaction coordinate. This provides a ballpark
 577 estimate used by AWH to initialize :math:`N` in terms of more meaningful
 578 quantities
 579
 580 .. math:: \frac{1}{N_0} = \frac{1}{N_0(\varepsilon_0, D)} \sim D\varepsilon_0^2.
 581           :label: eqawhn0
 582
 583 Essentially, this tells us that a slower system (small :math:`D`)
 584 requires more samples (larger :math:`N^0`) to attain the same level of
 585 accuracy (:math:`\varepsilon_0`) at a given sampling rate. Conversely,
 586 for a system of given diffusion, how to choose the initial biasing rate
 587 depends on how good the initial accuracy is. Both the initial error
 588 :math:`\varepsilon_0` and the diffusion :math:`D` only need to be
 589 roughly estimated or guessed. In the typical case, one would only tweak
 590 the :math:`D` parameter, and use a default value for
 591 :math:`\varepsilon_0`. For good convergence, :math:`D` should be chosen
 592 as large as possible (while maintaining a stable system) giving large
 593 initial bias updates and fast initial transitions. Choosing :math:`D`
 594 too small can lead to slow initial convergence. It may be a good idea to
 595 run a short trial simulation and after the first covering check the
 596 maximum free energy difference of the PMF estimate. If this is much
 597 larger than the expected magnitude of the free energy barriers that
 598 should be crossed, then the system is probably being pulled too hard and
 599 :math:`D` should be decreased. :math:`\varepsilon_0` on the other hand,
 600 would only be tweaked when starting an AWH simulation using a fairly
 601 accurate guess of the PMF as input.
 602
 603 Tips for efficient sampling
 604 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 605
 606 The force constant :math:`k` should be larger than the curvature of the
 607 PMF landscape. If this is not the case, the distributions of the
 608 reaction coordinate :math:`\xi` and the reference coordinate
 609 :math:`\lambda`, will differ significantly and warnings will be printed
 610 in the log file. One can choose :math:`k` as large as the time step
 611 supports. This will neccessarily increase the number of points of the
 612 discretized sampling interval :math:`I`. In general however, it should
 613 not affect the performance of the simulation noticeably because the AWH
 614 update is implemented such that only sampled points are accessed at free
 615 energy update time.
 616
 617 As with any method, the choice of reaction coordinate(s) is critical. If
 618 a single reaction coordinate does not suffice, identifying a second
 619 reaction coordinate and sampling the two-dimensional landscape may help.
 620 In this case, using a target distribution with a free energy cutoff (see
 621 :eq:`Eq. %s <eqawhrhocut>`) might be required to avoid
 622 sampling uninteresting regions of very high free energy. Obtaining
 623 accurate free energies for reaction coordinates of much higher
 624 dimensionality than 3 or possibly 4 is generally not feasible.
 625
 626 Monitoring the transition rate of :math:`\xi(t)`, across the sampling
 627 interval is also advisable. For reliable statistics (e.g. when
 628 reweighting the trajectory as described in section :ref:`awhreweight`),
 629 one would generally want to observe at least a few transitions after
 630 having exited the initial stage. Furthermore, if the dynamics of the
 631 reaction coordinate suddenly changes, this may be a sign of e.g. a
 632 reaction coordinate problem.
 633
 634 Difficult regions of sampling may also be detected by calculating the
 635 friction tensor :math:`\eta_{\mu\nu}(\lambda)` in the sampling interval,
 636 see section :ref:`awhfriction`. :math:`\eta_{\mu\nu}(\lambda)` as well
 637 as the sampling efficiency measure :math:`\eta^{\frac{1}{2}}(\lambda)`
 638 (:eq:`Eq. %s <eqawhsqrtmetric>`) are written to the energy file and can be
 639 extracted with :ref:`gmx awh`. A high peak in
 640 :math:`\eta^{\frac{1}{2}}(\lambda)` indicates that this region requires
 641 longer time to sample properly.