docs/reference-manual/special/awh.rst

   1 .. _awh:
   2
   3 Adaptive biasing with AWH
   4 -------------------------
   5
   6 The accelerated weight histogram method
   7 :ref:`137 <reflindahl2014accelerated>` calculates the PMF along a reaction coordinate by adding
   8 an adaptively determined biasing potential. AWH flattens free energy
   9 barriers along the reaction coordinate by applying a history-dependent
  10 potential to the system that “fills up” free energy minima. This is
  11 similar in spirit to other adaptive biasing potential methods, e.g. the
  12 Wang-Landau \ :ref:`138 <refwang2001efficient>`, local
  13 elevation \ :ref:`139 <refhuber1994local>` and
  14 metadynamics \ :ref:`140 <reflaio2002escaping>` methods.
  15 The initial sampling stage of AWH makes the method robust against the
  16 choice of input parameters. Furthermore, the target distribution along
  17 the reaction coordinate may be chosen freely.
  18
  19 Basics of the method
  20 ^^^^^^^^^^^^^^^^^^^^
  21
  22 Rather than biasing the reaction coordinate :math:`\xi(x)` directly, AWH
  23 acts on a *reference coordinate* :math:`\lambda`. The reaction
  24 coordinate :math:`\xi(x)` is coupled to :math:`\lambda` with a harmonic
  25 potential
  26
  27 .. math:: Q(\xi,\lambda) = \frac{1}{2} \beta k (\xi - \lambda)^2,
  28
  29 so that for large force constants :math:`k`,
  30 :math:`\xi \approx \lambda`. Note the use of dimensionless energies for
  31 compatibility with previously published work. Units of energy are
  32 obtained by multiplication with :math:`k_BT=1/\beta`. In the simulation,
  33 :math:`\lambda` samples the user-defined sampling interval :math:`I`.
  34 For a multidimensional reaction coordinate :math:`\xi`, the sampling
  35 interval is the Cartesian product :math:`I=\Pi_{\mu} I_{\mu}` (a rectangular
  36 domain). The connection between atom coordinates and :math:`\lambda` is
  37 established through the extended ensemble \ :ref:`68 <refLyubartsev1992>`,
  38
  39 .. math:: P(x,\lambda) = \frac{1}{\mathcal{Z}}e^{g(\lambda) - Q(\xi(x),\lambda) - V(x)},
  40           :label: eqawhpxlambda
  41
  42 where :math:`g(\lambda)` is a bias function (a free variable) and
  43 :math:`V(x)` is the unbiased potential energy of the system. The
  44 distribution along :math:`\lambda` can be tuned to be any predefined
  45 *target distribution* :math:`\rho(\lambda)` (often chosen to be flat) by
  46 choosing :math:`g(\lambda)` wisely. This is evident from
  47
  48 .. math:: P(\lambda) = \int P(x,\lambda)  dx =
  49           \frac{1}{\mathcal{Z}}e^{g(\lambda)} \int e^{- Q(\xi(x),\lambda) - V(x)}  dx
  50           \equiv \frac{1}{\mathcal{Z}}e^{g(\lambda) - F(\lambda)},
  51           :label: eqawhplambda
  52
  53 where :math:`F(\lambda)` is the free energy
  54
  55 .. math:: F(\lambda) = -\ln \int e^{- Q(\xi(x),\lambda) - V(x)}  dx.
  56           :label: eqawhflambda
  57
  58 Being the convolution of the PMF with the Gaussian defined by the
  59 harmonic potential, :math:`F(\lambda)` is a smoothened version of the
  60 PMF. :eq:`Eq. %s <eqawhplambda>` shows that in order to obtain
  61 :math:`P(\lambda)=\rho(\lambda)`, :math:`F(\lambda)` needs to be
  62 determined accurately. Thus, AWH adaptively calculates
  63 :math:`F(\lambda)` and simultaneously converges :math:`P(\lambda)`
  64 toward :math:`\rho(\lambda)`.
  65
  66 The free energy update
  67 ^^^^^^^^^^^^^^^^^^^^^^
  68
  69 AWH is initialized with an estimate of the free energy
  70 :math:`F_0(\lambda)`. At regular time intervals this estimate is updated
  71 using data collected in between the updates. At update :math:`n`, the
  72 applied bias :math:`g_n(\lambda)` is a function of the current free
  73 energy estimate :math:`F_n(\lambda)` and target distribution
  74 :math:`\rho_n(\lambda)`,
  75
  76 .. math:: g_n(\lambda) = \ln \rho_n(\lambda) +F_n(\lambda),
  77           :label: eqawhgrhofrelation
  78
  79 which is consistent with :eq:`Eq. %s <eqawhplambda>`. Note that also the
  80 target distribution may be updated during the simulation (see examples
  81 in section :ref:`awhtargets`). Substituting this choice of :math:`g=g_n`
  82 back into :eq:`Eq. %s <eqawhplambda>` yields the simple free energy update
  83
  84 .. math:: \Delta F_n(\lambda)
  85           = F(\lambda) - F_n(\lambda)
  86           = -\ln\frac{P_n(\lambda)}{\rho_n(\lambda)},
  87           :label: eqawhdfnaive
  88
  89 which would yield a better estimate :math:`F_{n+1} = F_n + \Delta F_n`,
  90 assuming :math:`P_n(\lambda)` can be measured accurately. AWH estimates
  91 :math:`P_n(\lambda)` by regularly calculating the conditional
  92 distribution
  93
  94 .. math:: \omega_n(\lambda|x) \equiv P_n(\lambda|x) = \frac{e^{g_n(\lambda) - Q(\xi(x), \lambda)}}{\sum_{\lambda'} e^{g_n(\lambda') - Q(\xi(x),\lambda')}}.
  95           :label: eqawhomega
  96
  97 Accumulating these probability weights yields
  98 :math:`\sum_t \omega(\lambda|x(t)) \sim P_n(\lambda)`, where
  99 :math:`\int P_n(\lambda|x) P_n(x) dx = P_n(\lambda)` has been used. The
 100 :math:`\omega_n(\lambda|x)` weights are thus the samples of the AWH
 101 method. With the limited amount of sampling one has in practice, update
 102 scheme :eq:`%s <eqawhdfnaive>` yields very noisy results. AWH instead applies a
 103 free energy update that has the same form but which can be applied
 104 repeatedly with limited and localized sampling,
 105
 106 .. math:: \Delta F_n = -\ln \frac{W_n(\lambda) + \sum_t \omega_n(\lambda|x(t))}{W_n(\lambda) + \sum_t\rho_n(\lambda)) }.
 107
 108 Here :math:`W_n(\lambda)` is the *reference weight histogram*
 109 representing prior sampling. The update for :math:`W(\lambda)`,
 110 disregarding the initial stage (see section :ref:`awhinitialstage`), is
 111
 112 .. math:: W_{n+1}(\lambda) = W_n(\lambda) + \sum_t\rho_n(\lambda).
 113           :label: eqawhwupdate
 114
 115 Thus, the weight histogram equals the targeted, “ideal” history of
 116 samples. There are two important things to note about the free energy
 117 update. First, sampling is driven away from oversampled, currently local
 118 regions. For such :math:`\lambda` values,
 119 :math:`\omega_n(\lambda) > \rho_n(\lambda)` and
 120 :math:`\Delta F_n(\lambda) < 0`, which by :eq:`Eq. %s <eqawhgrhofrelation>`
 121 implies :math:`\Delta g_n(\lambda) < 0` (assuming
 122 :math:`\Delta \rho_n \equiv 0`). Thus, the probability to sample
 123 :math:`\lambda` decreases after the update (see :eq:`Eq. %s <eqawhplambda>`).
 124 Secondly, the normalization of the histogram
 125 :math:`N_n=\sum_\lambda W_n(\lambda)`, determines the update size
 126 :math:`| \Delta F(\lambda) |`. For instance, for a single sample
 127 :math:`\omega(\lambda|x)`, the shape of the update is approximately a
 128 Gaussian function of width :math:`\sigma=1/\sqrt{\beta k}` and height
 129 :math:`\propto 1/N_n` :ref:`137 <reflindahl2014accelerated>`,
 130
 131 .. math:: | \Delta F_n(\lambda) | \propto \frac{1}{N_n} e^{-\frac{1}{2} \beta k (\xi(x) - \lambda)^2}.
 132           :label: eqawhdfsize
 133
 134 Therefore, as samples accumulate in :math:`W(\lambda)` and :math:`N_n`
 135 grows, the updates get smaller, allowing for the free energy to
 136 converge.
 137
 138 Note that quantity of interest to the user is not :math:`F(\lambda)` but
 139 the PMF :math:`\Phi(\xi)`. :math:`\Phi(\xi)` is extracted by reweighting
 140 samples :math:`\xi(t)` on the fly \ :ref:`137 <reflindahl2014accelerated>` (see
 141 also section :ref:`awhreweight`) and will converge at the same rate as
 142 :math:`F(\lambda)`, see :numref:`Fig. %s <fig-awhbiasevolution1>`. The PMF will be
 143 written to output (see section :ref:`awhusage`).
 144
 145 Applying the bias to the system
 146 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 147
 148 The bias potential can be applied to the system in two ways. Either by
 149 applying a harmonic potential centered at :math:`\lambda(t)`, which is
 150 sampled using (rejection-free) Monte-Carlo sampling from the conditional
 151 distribution :math:`\omega_n(\lambda | x(t)) = P_n(\lambda | x(t))`, see
 152 :eq:`Eq. %s <eqawhomega>`. This is also called Gibbs sampling or independence
 153 sampling. Alternatively, and by default in the code, the following
 154 *convolved bias potential* can be applied,
 155
 156 .. math:: U_n(\xi) = -\ln \int e^{ g_n(\lambda) -Q(\xi,\lambda)} d \lambda.
 157           :label: eqawhbiaspotential
 158
 159 These two approaches are equivalent in the sense that they give rise to
 160 the same biased probabilities :math:`P_n(x)`
 161 (cf. :eq:`%s <eqawhpxlambda>`) while the dynamics are clearly
 162 different in the two cases. This choice does not affect the internals of
 163 the AWH algorithm, only what force and potential AWH returns to the MD
 164 engine.
 165
 166 .. _fig-awhbiasevolution1:
 167
 168 .. figure:: plots/awh-traj.*
 169         :width: 8.0cm
 170
 171         AWH evolution in time for a Brownian particle in a double-well
 172         potential. The reaction coordinate :math:`\xi(t)` traverses the sampling
 173         interval multiple times in the initial stage before exiting and entering
 174         the final stage. In the final stage, the dynamics of
 175         :math:`\xi` becomes increasingly diffusive.
 176
 177 .. _fig-awhbiasevolution2:
 178
 179 .. figure:: plots/awh-invN.*
 180         :width: 8.0cm
 181
 182         In the final stage, the dynamics of
 183         :math:`\xi` becomes increasingly diffusive. The times of covering are
 184         shown as :math:`\times`-markers of different colors. At these times the
 185         free energy update size :math:`\sim 1/N`, where :math:`N` is the size of
 186         the weight histogram, is decreased by scaling :math:`N` by a factor of
 187         :math:`\gamma=3`.
 188
 189 .. _fig-awhbiasevolution3:
 190
 191 .. figure:: plots/awh-sampleweights.*
 192         :width: 8.0cm
 193
 194         In the final stage, :math:`N` grows at the
 195         sampling rate and thus :math:`1/N\sim1/t`. The exit from the final stage
 196         is determined on the fly by ensuring that the effective sample weight
 197         :math:`s` of data collected in the final stage exceeds that of initial
 198         stage data (note that :math:`\ln s(t)` is plotted).
 199
 200 .. _fig-awhbiasevolution4:
 201
 202 .. figure:: plots/awh-pmfs.*
 203         :width: 8.0cm
 204
 205         An estimate of the PMF is also extracted from the simulation (bottom
 206         right), which after exiting the initial stage should estimate global
 207         free energy differences fairly accurately.
 208
 209 .. _awhinitialstage:
 210
 211 The initial stage
 212 ~~~~~~~~~~~~~~~~~
 213
 214 Initially, when the bias potential is far from optimal, samples will be
 215 highly correlated. In such cases, letting :math:`W(\lambda)` accumulate
 216 samples as prescribed by :eq:`Eq. %s <eqawhwupdate>`, entails
 217 a too rapid decay of the free energy update size. This motivates
 218 splitting the simulation into an *initial stage* where the weight
 219 histogram grows according to a more restrictive and robust protocol, and
 220 a *final stage* where the the weight histogram grows linearly at the
 221 sampling rate (:eq:`Eq. %s <eqawhwupdate>`). The AWH initial
 222 stage takes inspiration from the well-known Wang-Landau algorithm \ :ref:`138 <refwang2001efficient>`,
 223 although there are differences in the details.
 224
 225 In the initial stage the update size is kept constant (by keeping
 226 :math:`N_n` constant) until a transition across the sampling interval
 227 has been detected, a “covering”. For the definition of a covering, see
 228 :eq:`Eq. %s <eqawhcovering>` below. After a covering has
 229 occurred, :math:`N_n` is scaled up by a constant “growth factor”
 230 :math:`\gamma`, chosen heuristically as :math:`\gamma=3`. Thus, in the
 231 initial stage :math:`N_n` is set dynamically as
 232 :math:`N_{n} = \gamma^{m} N_0`, where :math:`m` is the number of
 233 coverings. Since the update size scales as :math:`1/N` (
 234 :eq:`Eq. %s <eqawhdfsize>`) this leads to a close to
 235 exponential decay of the update size in the initial stage, see
 236 :numref:`Fig. %s <fig-awhbiasevolution1>`.
 237
 238 The update size directly determines the rate of change of
 239 :math:`F_n(\lambda)` and hence, from
 240 :eq:`Eq. %s <eqawhgrhofrelation>`, also the rate of change of
 241 the bias funcion :math:`g_n(\lambda)` Thus initially, when :math:`N_n`
 242 is kept small and updates large, the system will be driven along the
 243 reaction coordinate by the constantly fluctuating bias. If :math:`N_0`
 244 is set small enough, the first transition will typically be fast because
 245 of the large update size and will quickly give a first rough estimate of
 246 the free energy. The second transition, using :math:`N_1=\gamma N_0`
 247 refines this estimate further. Thus, rather than very carefully filling
 248 free energy minima using a small initial update size, the sampling
 249 interval is sweeped back-and-forth multiple times, using a wide range of
 250 update sizes, see :numref:`Fig. %s <fig-awhbiasevolution1>`. This
 251 way, the initial stage also makes AWH robust against the choice of
 252 :math:`N_0`.
 253
 254 The covering criterion
 255 ^^^^^^^^^^^^^^^^^^^^^^
 256
 257 In the general case of a multidimensional reaction coordinate
 258 :math:`\lambda=(\lambda_{\mu})`, the sampling interval :math:`I` is
 259 considered covered when all dimensions have been covered. A dimension
 260 :math:`d` is covered if all points :math:`\lambda_{\mu}` in the
 261 one-dimensional sampling interval :math:`I_{\mu}` have been “visited”.
 262 Finally, a point :math:`\lambda_{\mu} \in I_{\mu}` has been visited if there is
 263 at least one point :math:`\lambda^*\in I` with
 264 :math:`\lambda^*_{\mu} = \lambda_{\mu}` that since the last covering has
 265 accumulated probability weight corresponding to the peak of a
 266 multidimensional Gaussian distribution
 267
 268 .. math:: \Delta W(\lambda^*)
 269           \ge w_{\mathrm{peak}}
 270           \equiv \prod_{\mu} \frac{\Delta \lambda_{\mu}}{\sqrt{2\pi}\sigma_k}.
 271           :label: eqawhcovering
 272
 273 Here, :math:`\Delta \lambda_{\mu}` is the point spacing of the discretized
 274 :math:`I_{\mu}` and :math:`\sigma_k=1/\sqrt{\beta k_{\mu}}` (where :math:`k_{\mu}`
 275 is the force constant) is the Gaussian width.
 276
 277 Exit from the initial stage
 278 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 279
 280 For longer times, when major free energy barriers have largely been
 281 flattened by the converging bias potential, the histogram
 282 :math:`W(\lambda)` should grow at the actual sampling rate and the
 283 initial stage needs to be exited \ :ref:`141 <refbelardinelli2007fast>`.
 284 There are multiple reasonable (heuristic) ways of determining when this
 285 transition should take place. One option is to postulate that the number
 286 of samples in the weight histogram :math:`N_n` should never exceed the
 287 actual number of collected samples, and exit the initial stage when this
 288 condition breaks \ :ref:`137 <reflindahl2014accelerated>`. In the initial stage,
 289 :math:`N` grows close to exponentially while the collected number of
 290 samples grows linearly, so an exit will surely occur eventually. Here we
 291 instead apply an exit criterion based on the observation that
 292 “artifically” keeping :math:`N` constant while continuing to collect
 293 samples corresponds to scaling down the relative weight of old samples
 294 relative to new ones. Similarly, the subsequent scaling up of :math:`N`
 295 by a factor :math:`\gamma` corresponds to scaling up the weight of old
 296 data. Briefly, the exit criterion is devised such that the weight of a
 297 sample collected *after* the initial stage is always larger or equal to
 298 the weight of a sample collected *during* the initial stage, see
 299 :numref:`Fig. %s <fig-awhbiasevolution1>`. This is consistent with
 300 scaling down early, noisy data.
 301
 302 The initial stage exit criterion will now be described in detail. We
 303 start out at the beginning of a covering stage, so that :math:`N` has
 304 just been scaled by :math:`\gamma` and is now kept constant. Thus, the
 305 first sample of this stage has the weight :math:`s= 1/\gamma` relative
 306 to the last sample of the previous covering stage. We assume that
 307 :math:`\Delta N` samples are collected and added to :math:`W` for each
 308 update . To keep :math:`N` constant, :math:`W` needs to be scaled down
 309 by a factor :math:`N/(N + \Delta N)` after every update. Equivalently,
 310 this means that new data is scaled up relative to old data by the
 311 inverse factor. Thus, after :math:`\Delta n` updates a new sample has
 312 the relative weight
 313 :math:`s=(1/\gamma) [(N_n + \Delta N)/N_n]^{\Delta n}`. Now assume
 314 covering occurs at this time. To continue to the next covering stage,
 315 :math:`N` should be scaled by :math:`\gamma`, which corresponds to again
 316 multiplying :math:`s` by :math:`1/\gamma`. If at this point
 317 :math:`s \ge \gamma`, then after rescaling :math:`s \ge 1`; i.e. overall
 318 the relative weight of a new sample relative to an old sample is still
 319 growing fast. If on the contrary :math:`s < \gamma`, and this defines
 320 the exit from the initial stage, then the initial stage is over and from
 321 now :math:`N` simply grows at the sampling rate (see
 322 :eq:`Eq. %s <eqawhwupdate>`). To really ensure that
 323 :math:`s\ge 1` holds before exiting, so that samples after the exit have
 324 at least the sample weight of older samples, the last covering stage is
 325 extended by a sufficient number of updates.
 326
 327 .. _awhtargets:
 328
 329 Choice of target distribution
 330 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 331
 332 The target distribution :math:`\rho(\lambda)` is traditionally chosen to
 333 be uniform
 334
 335 .. math:: \rho_{\mathrm{const}}(\lambda) = \mathrm{const.}
 336
 337 This choice exactly flattens :math:`F(\lambda)` in user-defined
 338 sampling interval :math:`I`. Generally,
 339 :math:`\rho(\lambda)=0, \lambda\notin I`. In certain cases other choices
 340 may be preferable. For instance, in the multidimensional case the
 341 rectangular sampling interval is likely to contain regions of very high
 342 free energy, e.g. where atoms are clashing. To exclude such regions,
 343 :math:`\rho(\lambda)` can specified by the following function of the
 344 free energy
 345
 346 .. math:: \rho_{\mathrm{cut}}(\lambda) \propto \frac{1}{1+ e^{F(\lambda) - F_{\mathrm{cut}}}},
 347           :label: eqawhrhocut
 348
 349
 350 where :math:`F_{\mathrm{cut}}` is a free energy cutoff (relative to
 351 :math:`\min_\lambda F(\lambda)`). Thus, regions of the sampling interval
 352 where :math:`F(\lambda) > F_{\mathrm{cut}}` will be exponentially
 353 suppressed (in a smooth fashion). Alternatively, very high free energy
 354 regions could be avoided while still flattening more moderate free
 355 energy barriers by targeting a Boltzmann distribution corresponding to
 356 scaling :math:`\beta=1/k_BT` by a factor :math:`0<s_\beta<1`,
 357
 358 .. math:: \rho_{\mathrm{Boltz}}(\lambda) \propto e^{-s_\beta F(\lambda)},
 359           :label: eqawhrhoboltz
 360
 361 The parameter :math:`s_\beta` determines to what degree the free energy
 362 landscape is flattened; the lower :math:`s_\beta`, the flatter. Note
 363 that both :math:`\rho_{\mathrm{cut}}(\lambda)` and
 364 :math:`\rho_{\mathrm{Boltz}}(\lambda)` depend on :math:`F(\lambda)`,
 365 which needs to be substituted by the current best estimate
 366 :math:`F_n(\lambda)`. Thus, the target distribution is also updated
 367 (consistently with :eq:`Eq. %s <eqawhgrhofrelation>`).
 368
 369 There is in fact an alternative approach to obtaining
 370 :math:`\rho_{\mathrm{Boltz}}(\lambda)` as the limiting target
 371 distribution in AWH, which is particular in the way the weight histogram
 372 :math:`W(\lambda)` and the target distribution :math:`\rho` are updated
 373 and coupled to each other. This yields an evolution of the bias
 374 potential which is very similar to that of well-tempered
 375 metadynamics \ :ref:`142 <refbarducci2008well>`,
 376 see \ :ref:`137 <reflindahl2014accelerated>` for details. Because of the popularity and
 377 success of well-tempered metadynamics, this is a special case worth
 378 considering. In this case :math:`\rho` is a function of the reference
 379 weight histogram
 380
 381 .. math:: \rho_{\mathrm{Boltz,loc}}(\lambda) \propto W(\lambda),
 382
 383 and the update of the weight histogram is modified (cf.
 384 :eq:`Eq. %s <eqawhwupdate>`)
 385
 386 .. math:: W_{n+1}(\lambda) =  W_{n}(\lambda) + s_{\beta}\sum_t \omega(\lambda | x(t)).
 387
 388 Thus, here the weight histogram equals the real history of samples, but
 389 scaled by :math:`s_\beta`. This target distribution is called *local*
 390 Boltzmann since :math:`W` is only modified locally, where sampling has
 391 taken place. We see that when :math:`s_\beta \approx 0` the histogram
 392 essentially does not grow and the size of the free energy update will
 393 stay at a constant value (as in the original formulation of
 394 metadynamics). Thus, the free energy estimate will not converge, but
 395 continue to fluctuate around the correct value. This illustrates the
 396 inherent coupling between the convergence and choice of target
 397 distribution for this special choice of target. Furthermore note that
 398 when using :math:`\rho=\rho_{\mathrm{Boltz,loc}}` there is no initial
 399 stage (section :ref:`awhinitialstage`). The rescaling of the weight
 400 histogram applied in the initial stage is a global operation, which is
 401 incompatible :math:`\rho_{\mathrm{Boltz,loc}}` only depending locally on
 402 the sampling history.
 403
 404 Lastly, the target distribution can be modulated by arbitrary
 405 probability weights
 406
 407 .. math:: \rho(\lambda) = \rho_0(\lambda) w_{\mathrm{user}}(\lambda).
 408
 409 where :math:`w_{\mathrm{user}}(\lambda)` is provided by user data and
 410 in principle :math:`\rho_0(\lambda)` can be any of the target
 411 distributions mentioned above.
 412
 413 Multiple independent or sharing biases
 414 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 415
 416 Multiple independent bias potentials may be applied within one
 417 simulation. This only makes sense if the biased coordinates
 418 :math:`\xi^{(1)}`, :math:`\xi^{(2)}`, :math:`\ldots` evolve essentially
 419 independently from one another. A typical example of this would be when
 420 applying an independent bias to each monomer of a protein. Furthermore,
 421 multiple AWH simulations can be launched in parallel, each with a (set
 422 of) indepedendent biases.
 423
 424 If the defined sampling interval is large relative to the diffusion time
 425 of the reaction coordinate, traversing the sampling interval multiple
 426 times as is required by the initial stage
 427 (section :ref:`awhinitialstage`) may take an infeasible mount of
 428 simulation time. In these cases it could be advantageous to parallelize
 429 the work and have a group of multiple “walkers” :math:`\xi^{(i)}(t)`
 430 share a single bias potential. This can be achieved by collecting
 431 samples from all :math:`\xi^{(i)}` of the same sharing group into a
 432 single histogram and update a common free energy estimate. Samples can
 433 be shared between walkers within the simulation and/or between multiple
 434 simulations. However, currently only sharing between simulations is
 435 supported in the code while all biases within a simulation are
 436 independent.
 437
 438 Note that when attempting to shorten the simulation time by using
 439 bias-sharing walkers, care must be taken to ensure the simulations are
 440 still long enough to properly explore and equilibrate all regions of the
 441 sampling interval. To begin, the walkers in a group should be
 442 decorrelated and distributed approximately according to the target
 443 distribution before starting to refine the free energy. This can be
 444 achieved e.g. by “equilibrating” the shared weight histogram before
 445 letting it grow; for instance, :math:`W(\lambda)/N\approx \rho(\lambda)`
 446 with some tolerance.
 447
 448 Furthermore, the “covering” or transition criterion of the initial stage
 449 should to be generalized to detect when the sampling interval has been
 450 collectively traversed. One alternative is to just use the same
 451 criterion as for a single walker (but now with more samples), see
 452 :eq:`Eq. %s <eqawhcovering>`. However, in contrast to the
 453 single walker case this does not ensure that any real transitions across
 454 the sampling interval has taken place; in principle all walkers could be
 455 sampling only very locally and still cover the whole interval. Just as
 456 with a standard umbrella sampling procedure, the free energy may appear
 457 to be converged while in reality simulations sampling closeby
 458 :math:`\lambda` values are sampling disconnected regions of phase space.
 459 A stricter criterion, which helps avoid such issues, is to require that
 460 before a simulation marks a point :math:`\lambda_{\mu}` along dimension
 461 :math:`\mu` as visited, and shares this with the other walkers, also all
 462 points within a certain diameter :math:`D_{\mathrm{cover}}` should have
 463 been visited (i.e. fulfill :eq:`Eq. %s <eqawhcovering>`).
 464 Increasing :math:`D_{\mathrm{cover}}` increases robustness, but may slow
 465 down convergence. For the maximum value of :math:`D_{\mathrm{cover}}`,
 466 equal to the length of the sampling interval, the sampling interval is
 467 considered covered when at least one walker has independently traversed
 468 the sampling interval.
 469
 470 .. _awhreweight:
 471
 472 Reweighting and combining biased data
 473 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 474
 475 Often one may want to, post-simulation, calculate the unbiased PMF
 476 :math:`\Phi(u)` of another variable :math:`u(x)`. :math:`\Phi(u)` can be
 477 estimated using :math:`\xi`-biased data by reweighting (“unbiasing”) the
 478 trajectory using the bias potential :math:`U_{n(t)}`, see
 479 :eq:`Eq. %s <eqawhbiaspotential>`. Essentially, one bins the
 480 biased data along :math:`u` and removes the effect of :math:`U_{n(t)}`
 481 by dividing the weight of samples :math:`u(t)` by
 482 :math:`e^{-U_{n(t)}(\xi(t))}`,
 483
 484 .. math:: \hat{\Phi}(u)  = -\ln
 485           \sum_t 1_u(u(t))e^{U_{n(t)}(\xi(t)} \mathcal{Z}_{n(t)}.
 486           :label: eqawhunbias
 487
 488 Here the indicator function :math:`1_u` denotes the binning procedure:
 489 :math:`1_u(u') = 1` if :math:`u'` falls into the bin labeled by
 490 :math:`u` and :math:`0` otherwise. The normalization factor
 491 :math:`\mathcal{Z}_n = \int e^{-\Phi(\xi) - U_{n}(\xi)}d \xi` is the
 492 partition function of the extended ensemble. As can be seen
 493 :math:`\mathcal{Z}_n` depends on :math:`\Phi(\xi)`, the PMF of the
 494 (biased) reaction coordinate :math:`\xi` (which is calculated and
 495 written to file by the AWH simulation). It is advisable to use only
 496 final stage data in the reweighting procedure due to the rapid change of
 497 the bias potential during the initial stage. If one would include
 498 initial stage data, one should use the sample weights that are inferred
 499 by the repeated rescaling of the histogram in the initial stage, for the
 500 sake of consistency. Initial stage samples would then in any case be
 501 heavily scaled down relative to final stage samples. Note that
 502 :eq:`Eq. %s <eqawhunbias>` can also be used to combine data
 503 from multiple simulations (by adding another sum also over the
 504 trajectory set). Furthermore, when multiple independent AWH biases have
 505 generated a set of PMF estimates :math:`\{\hat{\Phi}^{(i)}(\xi)\}`, a
 506 combined best estimate :math:`\hat{\Phi}(\xi)` can be obtained by
 507 applying self-consistent exponential averaging. More details on this
 508 procedure and a derivation of :eq:`Eq. %s <eqawhunbias>`
 509 (using slightly different notation) can be found in :ref:`143 <reflindahl2017sequence>`.
 510
 511 .. _awhfriction:
 512
 513 The friction metric
 514 ~~~~~~~~~~~~~~~~~~~
 515
 516 During the AWH simulation, the following time-integrated force
 517 correlation function is calculated,
 518
 519 .. math:: \eta_{\mu\nu}(\lambda) =
 520           \beta
 521           \int_0^\infty
 522           \frac{
 523           \left<{\delta \mathcal{F}_{\mu}(x(t),\lambda)
 524           \delta \mathcal{F}_\nu(x(0),\lambda)
 525           \omega(\lambda|x(t)) \omega(\lambda|x(0))}\right>}
 526           {\left<{\omega^2(\lambda | x)}\right>}
 527           dt.
 528           :label: eqawhmetric
 529
 530 Here
 531 :math:`\mathcal F_\mu(x,\lambda) = k_\mu (\xi_\mu(x) - \lambda_\mu)` is
 532 the force along dimension :math:`\mu` from an harmonic potential
 533 centered at :math:`\lambda` and
 534 :math:`\delta \mathcal F_{\mu}(x,\lambda) = \mathcal F_{\mu}(x,\lambda) - \left<{\mathcal F_\mu(x,\lambda)}\right>`
 535 is the deviation of the force. The factors :math:`\omega(\lambda|x(t))`,
 536 see :eq:`Eq %s <eqawhomega>`, reweight the samples.
 537 :math:`\eta_{\mu\nu}(\lambda)` is a friction
 538 tensor \ :ref:`144 <refsivak2012thermodynamic>`. Its matrix elements are inversely proportional to local
 539 diffusion coefficients. A measure of sampling (in)efficiency at each
 540 :math:`\lambda` is given by
 541
 542 .. math:: \eta^{\frac{1}{2}}(\lambda) = \sqrt{\det\eta_{\mu\nu}(\lambda)}.
 543           :label: eqawhsqrtmetric
 544
 545 A large value of :math:`\eta^{\frac{1}{2}}(\lambda)` indicates slow
 546 dynamics and long correlation times, which may require more sampling.
 547
 548 .. _awhusage:
 549
 550 Usage
 551 ~~~~~
 552
 553 AWH stores data in the energy file (:ref:`edr`) with a frequency set by the
 554 user. The data – the PMF, the convolved bias, distributions of the
 555 :math:`\lambda` and :math:`\xi` coordinates, etc. – can be extracted
 556 after the simulation using the :ref:`gmx awh` tool. Furthermore, the trajectory
 557 of the reaction coordinate :math:`\xi(t)` is printed to the pull output
 558 file :math:`{\tt pullx.xvg}`. The log file (:ref:`log`) also contains
 559 information; check for messages starting with “awh”, they will tell you
 560 about covering and potential sampling issues.
 561
 562 Setting the initial update size
 563 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 564
 565 The initial value of the weight histogram size :math:`N` sets the
 566 initial update size (and the rate of change of the bias). When :math:`N`
 567 is kept constant, like in the initial stage, the average variance of the
 568 free energy scales as :math:`\varepsilon^2 \sim 1/(ND)`
 569 :ref:`137 <reflindahl2014accelerated>`, for a simple model system with constant diffusion
 570 :math:`D` along the reaction coordinate. This provides a ballpark
 571 estimate used by AWH to initialize :math:`N` in terms of more meaningful
 572 quantities
 573
 574 .. math:: \frac{1}{N_0} = \frac{1}{N_0(\varepsilon_0, D)} \sim D\varepsilon_0^2.
 575           :label: eqawhn0
 576
 577 Essentially, this tells us that a slower system (small :math:`D`)
 578 requires more samples (larger :math:`N^0`) to attain the same level of
 579 accuracy (:math:`\varepsilon_0`) at a given sampling rate. Conversely,
 580 for a system of given diffusion, how to choose the initial biasing rate
 581 depends on how good the initial accuracy is. Both the initial error
 582 :math:`\varepsilon_0` and the diffusion :math:`D` only need to be
 583 roughly estimated or guessed. In the typical case, one would only tweak
 584 the :math:`D` parameter, and use a default value for
 585 :math:`\varepsilon_0`. For good convergence, :math:`D` should be chosen
 586 as large as possible (while maintaining a stable system) giving large
 587 initial bias updates and fast initial transitions. Choosing :math:`D`
 588 too small can lead to slow initial convergence. It may be a good idea to
 589 run a short trial simulation and after the first covering check the
 590 maximum free energy difference of the PMF estimate. If this is much
 591 larger than the expected magnitude of the free energy barriers that
 592 should be crossed, then the system is probably being pulled too hard and
 593 :math:`D` should be decreased. :math:`\varepsilon_0` on the other hand,
 594 would only be tweaked when starting an AWH simulation using a fairly
 595 accurate guess of the PMF as input.
 596
 597 Tips for efficient sampling
 598 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 599
 600 The force constant :math:`k` should be larger than the curvature of the
 601 PMF landscape. If this is not the case, the distributions of the
 602 reaction coordinate :math:`\xi` and the reference coordinate
 603 :math:`\lambda`, will differ significantly and warnings will be printed
 604 in the log file. One can choose :math:`k` as large as the time step
 605 supports. This will neccessarily increase the number of points of the
 606 discretized sampling interval :math:`I`. In general however, it should
 607 not affect the performance of the simulation noticeably because the AWH
 608 update is implemented such that only sampled points are accessed at free
 609 energy update time.
 610
 611 As with any method, the choice of reaction coordinate(s) is critical. If
 612 a single reaction coordinate does not suffice, identifying a second
 613 reaction coordinate and sampling the two-dimensional landscape may help.
 614 In this case, using a target distribution with a free energy cutoff (see
 615 :eq:`Eq. %s <eqawhrhocut>`) might be required to avoid
 616 sampling uninteresting regions of very high free energy. Obtaining
 617 accurate free energies for reaction coordinates of much higher
 618 dimensionality than 3 or possibly 4 is generally not feasible.
 619
 620 Monitoring the transition rate of :math:`\xi(t)`, across the sampling
 621 interval is also advisable. For reliable statistics (e.g. when
 622 reweighting the trajectory as described in section :ref:`awhreweight`),
 623 one would generally want to observe at least a few transitions after
 624 having exited the initial stage. Furthermore, if the dynamics of the
 625 reaction coordinate suddenly changes, this may be a sign of e.g. a
 626 reaction coordinate problem.
 627
 628 Difficult regions of sampling may also be detected by calculating the
 629 friction tensor :math:`\eta_{\mu\nu}(\lambda)` in the sampling interval,
 630 see section :ref:`awhfriction`. :math:`\eta_{\mu\nu}(\lambda)` as well
 631 as the sampling efficiency measure :math:`\eta^{\frac{1}{2}}(\lambda)`
 632 (:eq:`Eq. %s <eqawhsqrtmetric>`) are written to the energy file and can be
 633 extracted with :ref:`gmx awh`. A high peak in
 634 :math:`\eta^{\frac{1}{2}}(\lambda)` indicates that this region requires
 635 longer time to sample properly.