docs/reference-manual/special/awh.rst

   1 .. _awh:
   2
   3 Adaptive biasing with AWH
   4 -------------------------
   5
   6 The accelerated weight histogram method :ref:`185 <refLidmar2012>`
   7 :ref:`137 <reflindahl2014accelerated>` calculates the PMF along a reaction coordinate by adding
   8 an adaptively determined biasing potential. AWH flattens free energy
   9 barriers along the reaction coordinate by applying a history-dependent
  10 potential to the system that “fills up” free energy minima. This is
  11 similar in spirit to other adaptive biasing potential methods, e.g. the
  12 Wang-Landau \ :ref:`138 <refwang2001efficient>`, local
  13 elevation \ :ref:`139 <refhuber1994local>` and
  14 metadynamics \ :ref:`140 <reflaio2002escaping>` methods.
  15 The initial sampling stage of AWH makes the method robust against the
  16 choice of input parameters. Furthermore, the target distribution along
  17 the reaction coordinate may be chosen freely.
  18
  19 Basics of the method
  20 ^^^^^^^^^^^^^^^^^^^^
  21
  22 Rather than biasing the reaction coordinate :math:`\xi(x)` directly, AWH
  23 acts on a *reference coordinate* :math:`\lambda`. The fundamentals of the
  24 method is based on the connection between atom coordinates and :math:`\lambda` and
  25 is established through the extended ensemble \ :ref:`68 <refLyubartsev1992>`,
  26
  27 .. math:: P(x,\lambda) = \frac{1}{\mathcal{Z}}e^{g(\lambda) - Q(\xi(x),\lambda) - V(x)},
  28           :label: eqawhpxlambda
  29
  30 where :math:`g(\lambda)` is a bias function (a free variable) and
  31 :math:`V(x)` is the unbiased potential energy of the system. The
  32 distribution along :math:`\lambda` can be tuned to be any predefined
  33 *target distribution* :math:`\rho(\lambda)` (often chosen to be flat) by
  34 choosing :math:`g(\lambda)` wisely. This is evident from
  35
  36 .. math:: P(\lambda) = \int P(x,\lambda)  dx =
  37           \frac{1}{\mathcal{Z}}e^{g(\lambda)} \int e^{- Q(\xi(x),\lambda) - V(x)}  dx
  38           \equiv \frac{1}{\mathcal{Z}}e^{g(\lambda) - F(\lambda)},
  39           :label: eqawhplambda
  40
  41 where :math:`F(\lambda)` is the free energy
  42
  43 .. math:: F(\lambda) = -\ln \int e^{- Q(\xi(x),\lambda) - V(x)}  dx.
  44           :label: eqawhflambda
  45
  46 The reaction coordinate :math:`\xi(x)` is commonly coupled to
  47 :math:`\lambda` with a harmonic potential
  48
  49 .. math:: Q(\xi,\lambda) = \frac{1}{2} \beta k (\xi - \lambda)^2,
  50           :label: eqnawhbasic
  51
  52 so that for large force constants :math:`k`,
  53 :math:`\xi \approx \lambda`. Note the use of dimensionless energies for
  54 compatibility with previously published work. Units of energy are
  55 obtained by multiplication with :math:`k_BT=1/\beta`. In the simulation,
  56 :math:`\lambda` samples the user-defined sampling interval :math:`I`.
  57
  58 Being the convolution of the PMF with the Gaussian defined by the
  59 harmonic potential, :math:`F(\lambda)` is a smoothened version of the
  60 PMF. :eq:`Eq. %s <eqawhplambda>` shows that in order to obtain
  61 :math:`P(\lambda)=\rho(\lambda)`, :math:`F(\lambda)` needs to be
  62 determined accurately. Thus, AWH adaptively calculates
  63 :math:`F(\lambda)` and simultaneously converges :math:`P(\lambda)`
  64 toward :math:`\rho(\lambda)`.
  65
  66 It is also possible to directly control the :math:`\lambda` state
  67 of, e.g., alchemical free energy perturbations :ref:`187 <reflundborg2021>`. In that case there is no harmonic
  68 potential and :math:`\lambda` changes in discrete steps along the reaction coordinate
  69 depending on the biased free energy difference between the :math:`\lambda` states.
  70 N.b., it is not yet possible to use AWH in combination with perturbed masses or
  71 constraints.
  72
  73 For a multidimensional reaction coordinate :math:`\xi`, the sampling
  74 interval is the Cartesian product :math:`I=\Pi_{\mu} I_{\mu}` (a rectangular
  75 domain).
  76
  77 The free energy update
  78 ^^^^^^^^^^^^^^^^^^^^^^
  79
  80 AWH is initialized with an estimate of the free energy
  81 :math:`F_0(\lambda)`. At regular time intervals this estimate is updated
  82 using data collected in between the updates. At update :math:`n`, the
  83 applied bias :math:`g_n(\lambda)` is a function of the current free
  84 energy estimate :math:`F_n(\lambda)` and target distribution
  85 :math:`\rho_n(\lambda)`,
  86
  87 .. math:: g_n(\lambda) = \ln \rho_n(\lambda) +F_n(\lambda),
  88           :label: eqawhgrhofrelation
  89
  90 which is consistent with :eq:`Eq. %s <eqawhplambda>`. Note that also the
  91 target distribution may be updated during the simulation (see examples
  92 in section :ref:`awhtargets`). Substituting this choice of :math:`g=g_n`
  93 back into :eq:`Eq. %s <eqawhplambda>` yields the simple free energy update
  94
  95 .. math:: \Delta F_n(\lambda)
  96           = F(\lambda) - F_n(\lambda)
  97           = -\ln\frac{P_n(\lambda)}{\rho_n(\lambda)},
  98           :label: eqawhdfnaive
  99
 100 which would yield a better estimate :math:`F_{n+1} = F_n + \Delta F_n`,
 101 assuming :math:`P_n(\lambda)` can be measured accurately. AWH estimates
 102 :math:`P_n(\lambda)` by regularly calculating the conditional
 103 distribution
 104
 105 .. math:: \omega_n(\lambda|x) \equiv P_n(\lambda|x) = \frac{e^{g_n(\lambda) - Q(\xi(x), \lambda)}}{\sum_{\lambda'} e^{g_n(\lambda') - Q(\xi(x),\lambda')}}.
 106           :label: eqawhomega
 107
 108 Accumulating these probability weights yields
 109 :math:`\sum_t \omega(\lambda|x(t)) \sim P_n(\lambda)`, where
 110 :math:`\int P_n(\lambda|x) P_n(x) dx = P_n(\lambda)` has been used. The
 111 :math:`\omega_n(\lambda|x)` weights are thus the samples of the AWH
 112 method. With the limited amount of sampling one has in practice, update
 113 scheme :eq:`%s <eqawhdfnaive>` yields very noisy results. AWH instead applies a
 114 free energy update that has the same form but which can be applied
 115 repeatedly with limited and localized sampling,
 116
 117 .. math:: \Delta F_n = -\ln \frac{W_n(\lambda) + \sum_t \omega_n(\lambda|x(t))}{W_n(\lambda) + \sum_t\rho_n(\lambda)) }.
 118           :label: eqnawhsampling
 119
 120 Here :math:`W_n(\lambda)` is the *reference weight histogram*
 121 representing prior sampling. The update for :math:`W(\lambda)`,
 122 disregarding the initial stage (see section :ref:`awhinitialstage`), is
 123
 124 .. math:: W_{n+1}(\lambda) = W_n(\lambda) + \sum_t\rho_n(\lambda).
 125           :label: eqawhwupdate
 126
 127 Thus, the weight histogram equals the targeted, “ideal” history of
 128 samples. There are two important things to note about the free energy
 129 update. First, sampling is driven away from oversampled, currently local
 130 regions. For such :math:`\lambda` values,
 131 :math:`\omega_n(\lambda) > \rho_n(\lambda)` and
 132 :math:`\Delta F_n(\lambda) < 0`, which by :eq:`Eq. %s <eqawhgrhofrelation>`
 133 implies :math:`\Delta g_n(\lambda) < 0` (assuming
 134 :math:`\Delta \rho_n \equiv 0`). Thus, the probability to sample
 135 :math:`\lambda` decreases after the update (see :eq:`Eq. %s <eqawhplambda>`).
 136 Secondly, the normalization of the histogram
 137 :math:`N_n=\sum_\lambda W_n(\lambda)`, determines the update size
 138 :math:`| \Delta F(\lambda) |`. For instance, for a single sample
 139 :math:`\omega(\lambda|x)`, and using a harmonic potential
 140 (:see :eq:`Eq. %s <eqnawhbasic>`),
 141 the shape of the update is approximately a
 142 Gaussian function of width :math:`\sigma=1/\sqrt{\beta k}` and height
 143 :math:`\propto 1/N_n` :ref:`137 <reflindahl2014accelerated>`,
 144
 145 .. math:: | \Delta F_n(\lambda) | \propto \frac{1}{N_n} e^{-\frac{1}{2} \beta k (\xi(x) - \lambda)^2}.
 146           :label: eqawhdfsize
 147
 148 When directly controlling the lambda state of the system, the shape of
 149 the update is instead
 150
 151 .. math:: | \Delta F_n(\lambda) | \propto \frac{1}{N_n} P_n(\lambda | x).
 152           :label: eqawhdfsizelambda
 153
 154 Therefore, in both cases, as samples accumulate in :math:`W(\lambda)` and
 155 :math:`N_n` grows, the updates get smaller, allowing for the free energy to
 156 converge.
 157
 158 Note that quantity of interest to the user is not :math:`F(\lambda)` but
 159 the PMF :math:`\Phi(\xi)`. :math:`\Phi(\xi)` is extracted by reweighting
 160 samples :math:`\xi(t)` on the fly \ :ref:`137 <reflindahl2014accelerated>` (see
 161 also section :ref:`awhreweight`) and will converge at the same rate as
 162 :math:`F(\lambda)`, see :numref:`Fig. %s <fig-awhbiasevolution1>`. The PMF will be
 163 written to output (see section :ref:`awhusage`).
 164
 165 Applying the bias to the system
 166 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 167
 168 The bias potential can be applied to the system in two ways. Either by
 169 applying a harmonic potential centered at :math:`\lambda(t)`, which is
 170 sampled using (rejection-free) Monte-Carlo sampling from the conditional
 171 distribution :math:`\omega_n(\lambda | x(t)) = P_n(\lambda | x(t))`, see
 172 :eq:`Eq. %s <eqawhomega>`. This is also called Gibbs sampling or independence
 173 sampling. Alternatively, and by default in the code, the following
 174 *convolved bias potential* can be applied,
 175
 176 .. math:: U_n(\xi) = -\ln \int e^{ g_n(\lambda) -Q(\xi,\lambda)} d \lambda.
 177           :label: eqawhbiaspotential
 178
 179 These two approaches are equivalent in the sense that they give rise to
 180 the same biased probabilities :math:`P_n(x)`
 181 (cf. :eq:`%s <eqawhpxlambda>`) while the dynamics are clearly
 182 different in the two cases. This choice does not affect the internals of
 183 the AWH algorithm, only what force and potential AWH returns to the MD
 184 engine.
 185
 186 Along a bias dimension directly controlling the
 187 :math:`\lambda` state of the system, such as when controlling free energy
 188 perturbations, the Monte-Carlo sampling alternative is always used, even if
 189 a convolved bias potential is chosen to be used along the other dimensions
 190 (if there are more than one).
 191
 192
 193 .. _fig-awhbiasevolution1:
 194
 195 .. figure:: plots/awh-traj.*
 196         :width: 8.0cm
 197
 198         AWH evolution in time for a Brownian particle in a double-well
 199         potential. The reaction coordinate :math:`\xi(t)` traverses the sampling
 200         interval multiple times in the initial stage before exiting and entering
 201         the final stage. In the final stage, the dynamics of
 202         :math:`\xi` becomes increasingly diffusive.
 203
 204 .. _fig-awhbiasevolution2:
 205
 206 .. figure:: plots/awh-invN.*
 207         :width: 8.0cm
 208
 209         In the final stage, the dynamics of
 210         :math:`\xi` becomes increasingly diffusive. The times of covering are
 211         shown as :math:`\times`-markers of different colors. At these times the
 212         free energy update size :math:`\sim 1/N`, where :math:`N` is the size of
 213         the weight histogram, is decreased by scaling :math:`N` by a factor of
 214         :math:`\gamma=3`.
 215
 216 .. _fig-awhbiasevolution3:
 217
 218 .. figure:: plots/awh-sampleweights.*
 219         :width: 8.0cm
 220
 221         In the final stage, :math:`N` grows at the
 222         sampling rate and thus :math:`1/N\sim1/t`. The exit from the final stage
 223         is determined on the fly by ensuring that the effective sample weight
 224         :math:`s` of data collected in the final stage exceeds that of initial
 225         stage data (note that :math:`\ln s(t)` is plotted).
 226
 227 .. _fig-awhbiasevolution4:
 228
 229 .. figure:: plots/awh-pmfs.*
 230         :width: 8.0cm
 231
 232         An estimate of the PMF is also extracted from the simulation (bottom
 233         right), which after exiting the initial stage should estimate global
 234         free energy differences fairly accurately.
 235
 236 .. _awhinitialstage:
 237
 238 The initial stage
 239 ~~~~~~~~~~~~~~~~~
 240
 241 Initially, when the bias potential is far from optimal, samples will be
 242 highly correlated. In such cases, letting :math:`W(\lambda)` accumulate
 243 samples as prescribed by :eq:`Eq. %s <eqawhwupdate>`, entails
 244 a too rapid decay of the free energy update size. This motivates
 245 splitting the simulation into an *initial stage* where the weight
 246 histogram grows according to a more restrictive and robust protocol, and
 247 a *final stage* where the weight histogram grows linearly at the
 248 sampling rate (:eq:`Eq. %s <eqawhwupdate>`). The AWH initial
 249 stage takes inspiration from the well-known Wang-Landau algorithm \ :ref:`138 <refwang2001efficient>`,
 250 although there are differences in the details.
 251
 252 In the initial stage the update size is kept constant (by keeping
 253 :math:`N_n` constant) until a transition across the sampling interval
 254 has been detected, a “covering”. For the definition of a covering, see
 255 :eq:`Eq. %s <eqawhcovering>` below. After a covering has
 256 occurred, :math:`N_n` is scaled up by a constant “growth factor”
 257 :math:`\gamma`, chosen heuristically as :math:`\gamma=3`. Thus, in the
 258 initial stage :math:`N_n` is set dynamically as
 259 :math:`N_{n} = \gamma^{m} N_0`, where :math:`m` is the number of
 260 coverings. Since the update size scales as :math:`1/N` (
 261 :eq:`Eq. %s <eqawhdfsize>`) this leads to a close to
 262 exponential decay of the update size in the initial stage, see
 263 :numref:`Fig. %s <fig-awhbiasevolution1>`.
 264
 265 The update size directly determines the rate of change of
 266 :math:`F_n(\lambda)` and hence, from
 267 :eq:`Eq. %s <eqawhgrhofrelation>`, also the rate of change of
 268 the bias funcion :math:`g_n(\lambda)` Thus initially, when :math:`N_n`
 269 is kept small and updates large, the system will be driven along the
 270 reaction coordinate by the constantly fluctuating bias. If :math:`N_0`
 271 is set small enough, the first transition will typically be fast because
 272 of the large update size and will quickly give a first rough estimate of
 273 the free energy. The second transition, using :math:`N_1=\gamma N_0`
 274 refines this estimate further. Thus, rather than very carefully filling
 275 free energy minima using a small initial update size, the sampling
 276 interval is sweeped back-and-forth multiple times, using a wide range of
 277 update sizes, see :numref:`Fig. %s <fig-awhbiasevolution1>`. This
 278 way, the initial stage also makes AWH robust against the choice of
 279 :math:`N_0`.
 280
 281 The covering criterion
 282 ^^^^^^^^^^^^^^^^^^^^^^
 283
 284 In the general case of a multidimensional reaction coordinate
 285 :math:`\lambda=(\lambda_{\mu})`, the sampling interval :math:`I` is
 286 considered covered when all dimensions have been covered. A dimension
 287 :math:`d` is covered if all points :math:`\lambda_{\mu}` in the
 288 one-dimensional sampling interval :math:`I_{\mu}` have been “visited”.
 289 Finally, a point :math:`\lambda_{\mu} \in I_{\mu}` has been visited if there is
 290 at least one point :math:`\lambda^*\in I` with
 291 :math:`\lambda^*_{\mu} = \lambda_{\mu}` that since the last covering has
 292 accumulated probability weight corresponding to the peak of a
 293 multidimensional Gaussian distribution
 294
 295 .. math:: \Delta W(\lambda^*)
 296           \ge w_{\mathrm{peak}}
 297           \equiv \prod_{\mu} \frac{\Delta \lambda_{\mu}}{\sqrt{2\pi}\sigma_k}.
 298           :label: eqawhcovering
 299
 300 Here, :math:`\Delta \lambda_{\mu}` is the point spacing of the discretized
 301 :math:`I_{\mu}` and :math:`\sigma_k=1/\sqrt{\beta k_{\mu}}` (where :math:`k_{\mu}`
 302 is the force constant) is the Gaussian width.
 303
 304 Exit from the initial stage
 305 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 306
 307 For longer times, when major free energy barriers have largely been
 308 flattened by the converging bias potential, the histogram
 309 :math:`W(\lambda)` should grow at the actual sampling rate and the
 310 initial stage needs to be exited \ :ref:`141 <refbelardinelli2007fast>`.
 311 There are multiple reasonable (heuristic) ways of determining when this
 312 transition should take place. One option is to postulate that the number
 313 of samples in the weight histogram :math:`N_n` should never exceed the
 314 actual number of collected samples, and exit the initial stage when this
 315 condition breaks \ :ref:`137 <reflindahl2014accelerated>`. In the initial stage,
 316 :math:`N` grows close to exponentially while the collected number of
 317 samples grows linearly, so an exit will surely occur eventually. Here we
 318 instead apply an exit criterion based on the observation that
 319 “artifically” keeping :math:`N` constant while continuing to collect
 320 samples corresponds to scaling down the relative weight of old samples
 321 relative to new ones. Similarly, the subsequent scaling up of :math:`N`
 322 by a factor :math:`\gamma` corresponds to scaling up the weight of old
 323 data. Briefly, the exit criterion is devised such that the weight of a
 324 sample collected *after* the initial stage is always larger or equal to
 325 the weight of a sample collected *during* the initial stage, see
 326 :numref:`Fig. %s <fig-awhbiasevolution1>`. This is consistent with
 327 scaling down early, noisy data.
 328
 329 The initial stage exit criterion will now be described in detail. We
 330 start out at the beginning of a covering stage, so that :math:`N` has
 331 just been scaled by :math:`\gamma` and is now kept constant. Thus, the
 332 first sample of this stage has the weight :math:`s= 1/\gamma` relative
 333 to the last sample of the previous covering stage. We assume that
 334 :math:`\Delta N` samples are collected and added to :math:`W` for each
 335 update . To keep :math:`N` constant, :math:`W` needs to be scaled down
 336 by a factor :math:`N/(N + \Delta N)` after every update. Equivalently,
 337 this means that new data is scaled up relative to old data by the
 338 inverse factor. Thus, after :math:`\Delta n` updates a new sample has
 339 the relative weight
 340 :math:`s=(1/\gamma) [(N_n + \Delta N)/N_n]^{\Delta n}`. Now assume
 341 covering occurs at this time. To continue to the next covering stage,
 342 :math:`N` should be scaled by :math:`\gamma`, which corresponds to again
 343 multiplying :math:`s` by :math:`1/\gamma`. If at this point
 344 :math:`s \ge \gamma`, then after rescaling :math:`s \ge 1`; i.e. overall
 345 the relative weight of a new sample relative to an old sample is still
 346 growing fast. If on the contrary :math:`s < \gamma`, and this defines
 347 the exit from the initial stage, then the initial stage is over and from
 348 now :math:`N` simply grows at the sampling rate (see
 349 :eq:`Eq. %s <eqawhwupdate>`). To really ensure that
 350 :math:`s\ge 1` holds before exiting, so that samples after the exit have
 351 at least the sample weight of older samples, the last covering stage is
 352 extended by a sufficient number of updates.
 353
 354 .. _awhtargets:
 355
 356 Choice of target distribution
 357 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 358
 359 The target distribution :math:`\rho(\lambda)` is traditionally chosen to
 360 be uniform
 361
 362 .. math:: \rho_{\mathrm{const}}(\lambda) = \mathrm{const.}
 363           :label: eqnawhuniformdist
 364
 365 This choice exactly flattens :math:`F(\lambda)` in user-defined
 366 sampling interval :math:`I`. Generally,
 367 :math:`\rho(\lambda)=0, \lambda\notin I`. In certain cases other choices
 368 may be preferable. For instance, in the multidimensional case the
 369 rectangular sampling interval is likely to contain regions of very high
 370 free energy, e.g. where atoms are clashing. To exclude such regions,
 371 :math:`\rho(\lambda)` can specified by the following function of the
 372 free energy
 373
 374 .. math:: \rho_{\mathrm{cut}}(\lambda) \propto \frac{1}{1+ e^{F(\lambda) - F_{\mathrm{cut}}}},
 375           :label: eqawhrhocut
 376
 377
 378 where :math:`F_{\mathrm{cut}}` is a free energy cutoff (relative to
 379 :math:`\min_\lambda F(\lambda)`). Thus, regions of the sampling interval
 380 where :math:`F(\lambda) > F_{\mathrm{cut}}` will be exponentially
 381 suppressed (in a smooth fashion). Alternatively, very high free energy
 382 regions could be avoided while still flattening more moderate free
 383 energy barriers by targeting a Boltzmann distribution corresponding to
 384 scaling :math:`\beta=1/k_BT` by a factor :math:`0<s_\beta<1`,
 385
 386 .. math:: \rho_{\mathrm{Boltz}}(\lambda) \propto e^{-s_\beta F(\lambda)},
 387           :label: eqawhrhoboltz
 388
 389 The parameter :math:`s_\beta` determines to what degree the free energy
 390 landscape is flattened; the lower :math:`s_\beta`, the flatter. Note
 391 that both :math:`\rho_{\mathrm{cut}}(\lambda)` and
 392 :math:`\rho_{\mathrm{Boltz}}(\lambda)` depend on :math:`F(\lambda)`,
 393 which needs to be substituted by the current best estimate
 394 :math:`F_n(\lambda)`. Thus, the target distribution is also updated
 395 (consistently with :eq:`Eq. %s <eqawhgrhofrelation>`).
 396
 397 There is in fact an alternative approach to obtaining
 398 :math:`\rho_{\mathrm{Boltz}}(\lambda)` as the limiting target
 399 distribution in AWH, which is particular in the way the weight histogram
 400 :math:`W(\lambda)` and the target distribution :math:`\rho` are updated
 401 and coupled to each other. This yields an evolution of the bias
 402 potential which is very similar to that of well-tempered
 403 metadynamics \ :ref:`142 <refbarducci2008well>`,
 404 see \ :ref:`137 <reflindahl2014accelerated>` for details. Because of the popularity and
 405 success of well-tempered metadynamics, this is a special case worth
 406 considering. In this case :math:`\rho` is a function of the reference
 407 weight histogram
 408
 409 .. math:: \rho_{\mathrm{Boltz,loc}}(\lambda) \propto W(\lambda),
 410           :label: eqnawhweighthistogram
 411
 412 and the update of the weight histogram is modified (cf.
 413 :eq:`Eq. %s <eqawhwupdate>`)
 414
 415 .. math:: W_{n+1}(\lambda) =  W_{n}(\lambda) + s_{\beta}\sum_t \omega(\lambda | x(t)).
 416           :label: eqnawhupdateweighthist
 417
 418 Thus, here the weight histogram equals the real history of samples, but
 419 scaled by :math:`s_\beta`. This target distribution is called *local*
 420 Boltzmann since :math:`W` is only modified locally, where sampling has
 421 taken place. We see that when :math:`s_\beta \approx 0` the histogram
 422 essentially does not grow and the size of the free energy update will
 423 stay at a constant value (as in the original formulation of
 424 metadynamics). Thus, the free energy estimate will not converge, but
 425 continue to fluctuate around the correct value. This illustrates the
 426 inherent coupling between the convergence and choice of target
 427 distribution for this special choice of target. Furthermore note that
 428 when using :math:`\rho=\rho_{\mathrm{Boltz,loc}}` there is no initial
 429 stage (section :ref:`awhinitialstage`). The rescaling of the weight
 430 histogram applied in the initial stage is a global operation, which is
 431 incompatible :math:`\rho_{\mathrm{Boltz,loc}}` only depending locally on
 432 the sampling history.
 433
 434 Lastly, the target distribution can be modulated by arbitrary
 435 probability weights
 436
 437 .. math:: \rho(\lambda) = \rho_0(\lambda) w_{\mathrm{user}}(\lambda).
 438           :label: eqnawhpropweigth
 439
 440 where :math:`w_{\mathrm{user}}(\lambda)` is provided by user data and
 441 in principle :math:`\rho_0(\lambda)` can be any of the target
 442 distributions mentioned above.
 443
 444 Multiple independent or sharing biases
 445 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 446
 447 Multiple independent bias potentials may be applied within one
 448 simulation. This only makes sense if the biased coordinates
 449 :math:`\xi^{(1)}`, :math:`\xi^{(2)}`, :math:`\ldots` evolve essentially
 450 independently from one another. A typical example of this would be when
 451 applying an independent bias to each monomer of a protein. Furthermore,
 452 multiple AWH simulations can be launched in parallel, each with a (set
 453 of) indepedendent biases.
 454
 455 If the defined sampling interval is large relative to the diffusion time
 456 of the reaction coordinate, traversing the sampling interval multiple
 457 times as is required by the initial stage
 458 (section :ref:`awhinitialstage`) may take an infeasible mount of
 459 simulation time. In these cases it could be advantageous to parallelize
 460 the work and have a group of multiple “walkers” :math:`\xi^{(i)}(t)`
 461 share a single bias potential. This can be achieved by collecting
 462 samples from all :math:`\xi^{(i)}` of the same sharing group into a
 463 single histogram and update a common free energy estimate. Samples can
 464 be shared between walkers within the simulation and/or between multiple
 465 simulations. However, currently only sharing between simulations is
 466 supported in the code while all biases within a simulation are
 467 independent.
 468
 469 Note that when attempting to shorten the simulation time by using
 470 bias-sharing walkers, care must be taken to ensure the simulations are
 471 still long enough to properly explore and equilibrate all regions of the
 472 sampling interval. To begin, the walkers in a group should be
 473 decorrelated and distributed approximately according to the target
 474 distribution before starting to refine the free energy. This can be
 475 achieved e.g. by “equilibrating” the shared weight histogram before
 476 letting it grow; for instance, :math:`W(\lambda)/N\approx \rho(\lambda)`
 477 with some tolerance.
 478
 479 Furthermore, the “covering” or transition criterion of the initial stage
 480 should to be generalized to detect when the sampling interval has been
 481 collectively traversed. One alternative is to just use the same
 482 criterion as for a single walker (but now with more samples), see
 483 :eq:`Eq. %s <eqawhcovering>`. However, in contrast to the
 484 single walker case this does not ensure that any real transitions across
 485 the sampling interval has taken place; in principle all walkers could be
 486 sampling only very locally and still cover the whole interval. Just as
 487 with a standard umbrella sampling procedure, the free energy may appear
 488 to be converged while in reality simulations sampling closeby
 489 :math:`\lambda` values are sampling disconnected regions of phase space.
 490 A stricter criterion, which helps avoid such issues, is to require that
 491 before a simulation marks a point :math:`\lambda_{\mu}` along dimension
 492 :math:`\mu` as visited, and shares this with the other walkers, also all
 493 points within a certain diameter :math:`D_{\mathrm{cover}}` should have
 494 been visited (i.e. fulfill :eq:`Eq. %s <eqawhcovering>`).
 495 Increasing :math:`D_{\mathrm{cover}}` increases robustness, but may slow
 496 down convergence. For the maximum value of :math:`D_{\mathrm{cover}}`,
 497 equal to the length of the sampling interval, the sampling interval is
 498 considered covered when at least one walker has independently traversed
 499 the sampling interval.
 500
 501 .. _awhreweight:
 502
 503 Reweighting and combining biased data
 504 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 505
 506 Often one may want to, post-simulation, calculate the unbiased PMF
 507 :math:`\Phi(u)` of another variable :math:`u(x)`. :math:`\Phi(u)` can be
 508 estimated using :math:`\xi`-biased data by reweighting (“unbiasing”) the
 509 trajectory using the bias potential :math:`U_{n(t)}`, see
 510 :eq:`Eq. %s <eqawhbiaspotential>`. Essentially, one bins the
 511 biased data along :math:`u` and removes the effect of :math:`U_{n(t)}`
 512 by dividing the weight of samples :math:`u(t)` by
 513 :math:`e^{-U_{n(t)}(\xi(t))}`,
 514
 515 .. math:: \hat{\Phi}(u)  = -\ln
 516           \sum_t 1_u(u(t))e^{U_{n(t)}(\xi(t)} \mathcal{Z}_{n(t)}.
 517           :label: eqawhunbias
 518
 519 Here the indicator function :math:`1_u` denotes the binning procedure:
 520 :math:`1_u(u') = 1` if :math:`u'` falls into the bin labeled by
 521 :math:`u` and :math:`0` otherwise. The normalization factor
 522 :math:`\mathcal{Z}_n = \int e^{-\Phi(\xi) - U_{n}(\xi)}d \xi` is the
 523 partition function of the extended ensemble. As can be seen
 524 :math:`\mathcal{Z}_n` depends on :math:`\Phi(\xi)`, the PMF of the
 525 (biased) reaction coordinate :math:`\xi` (which is calculated and
 526 written to file by the AWH simulation). It is advisable to use only
 527 final stage data in the reweighting procedure due to the rapid change of
 528 the bias potential during the initial stage. If one would include
 529 initial stage data, one should use the sample weights that are inferred
 530 by the repeated rescaling of the histogram in the initial stage, for the
 531 sake of consistency. Initial stage samples would then in any case be
 532 heavily scaled down relative to final stage samples. Note that
 533 :eq:`Eq. %s <eqawhunbias>` can also be used to combine data
 534 from multiple simulations (by adding another sum also over the
 535 trajectory set). Furthermore, when multiple independent AWH biases have
 536 generated a set of PMF estimates :math:`\{\hat{\Phi}^{(i)}(\xi)\}`, a
 537 combined best estimate :math:`\hat{\Phi}(\xi)` can be obtained by
 538 applying self-consistent exponential averaging. More details on this
 539 procedure and a derivation of :eq:`Eq. %s <eqawhunbias>`
 540 (using slightly different notation) can be found in :ref:`143 <reflindahl2017sequence>`.
 541
 542 .. _awhfriction:
 543
 544 The friction metric
 545 ~~~~~~~~~~~~~~~~~~~
 546
 547 During the AWH simulation, the following time-integrated force
 548 correlation function is calculated,
 549
 550 .. math:: \eta_{\mu\nu}(\lambda) =
 551           \beta
 552           \int_0^\infty
 553           \frac{
 554           \left<{\delta \mathcal{F}_{\mu}(x(t),\lambda)
 555           \delta \mathcal{F}_\nu(x(0),\lambda)
 556           \omega(\lambda|x(t)) \omega(\lambda|x(0))}\right>}
 557           {\left<{\omega^2(\lambda | x)}\right>}
 558           dt.
 559           :label: eqawhmetric
 560
 561 Here
 562 :math:`\mathcal F_\mu(x,\lambda) = k_\mu (\xi_\mu(x) - \lambda_\mu)` is
 563 the force along dimension :math:`\mu` from an harmonic potential
 564 centered at :math:`\lambda` and
 565 :math:`\delta \mathcal F_{\mu}(x,\lambda) = \mathcal F_{\mu}(x,\lambda) - \left<{\mathcal F_\mu(x,\lambda)}\right>`
 566 is the deviation of the force. The factors :math:`\omega(\lambda|x(t))`,
 567 see :eq:`Eq %s <eqawhomega>`, reweight the samples.
 568 :math:`\eta_{\mu\nu}(\lambda)` is a friction
 569 tensor :ref:`186 <reflindahl2018>` and :ref:`144 <refsivak2012thermodynamic>`. Its matrix elements are inversely proportional to local
 570 diffusion coefficients. A measure of sampling (in)efficiency at each
 571 :math:`\lambda` is given by
 572
 573 .. math:: \eta^{\frac{1}{2}}(\lambda) = \sqrt{\det\eta_{\mu\nu}(\lambda)}.
 574           :label: eqawhsqrtmetric
 575
 576 A large value of :math:`\eta^{\frac{1}{2}}(\lambda)` indicates slow
 577 dynamics and long correlation times, which may require more sampling.
 578
 579 .. _awhusage:
 580
 581 Usage
 582 ~~~~~
 583
 584 AWH stores data in the energy file (:ref:`edr`) with a frequency set by the
 585 user. The data – the PMF, the convolved bias, distributions of the
 586 :math:`\lambda` and :math:`\xi` coordinates, etc. – can be extracted
 587 after the simulation using the :ref:`gmx awh` tool. Furthermore, the trajectory
 588 of the reaction coordinate :math:`\xi(t)` is printed to the pull output
 589 file :math:`{\tt pullx.xvg}`. The log file (:ref:`log`) also contains
 590 information; check for messages starting with “awh”, they will tell you
 591 about covering and potential sampling issues.
 592
 593 Setting the initial update size
 594 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 595
 596 The initial value of the weight histogram size :math:`N` sets the
 597 initial update size (and the rate of change of the bias). When :math:`N`
 598 is kept constant, like in the initial stage, the average variance of the
 599 free energy scales as :math:`\varepsilon^2 \sim 1/(ND)`
 600 :ref:`137 <reflindahl2014accelerated>`, for a simple model system with constant diffusion
 601 :math:`D` along the reaction coordinate. This provides a ballpark
 602 estimate used by AWH to initialize :math:`N` in terms of more meaningful
 603 quantities
 604
 605 .. math:: \frac{1}{N_0} = \frac{1}{N_0(\varepsilon_0, D)} = \frac{1}{\Delta
 606           t_\mathrm{sample}} \max_d \frac{L_d^2}{2D_d} \varepsilon_0^2
 607           :label: eqawhn0
 608
 609 where :math:`L_d` is the length of the interval and :math:`D_d` is
 610 the diffusion constant along dimension :math:`d` of the AWH bias.
 611 For one dimension, :math:`L^2/2D` is the average time to diffuse
 612 over a distance of :math:`L`. We then takes the maximum crossing
 613 time over all dimensions involved in the bias.
 614 Essentially, this formula tells us that a slower system (small :math:`D`)
 615 requires more samples (larger :math:`N^0`) to attain the same level of
 616 accuracy (:math:`\varepsilon_0`) at a given sampling rate. Conversely,
 617 for a system of given diffusion, how to choose the initial biasing rate
 618 depends on how good the initial accuracy is. Both the initial error
 619 :math:`\varepsilon_0` and the diffusion :math:`D` only need to be
 620 roughly estimated or guessed. In the typical case, one would only tweak
 621 the :math:`D` parameter, and use a default value for
 622 :math:`\varepsilon_0`. For good convergence, :math:`D` should be chosen
 623 as large as possible (while maintaining a stable system) giving large
 624 initial bias updates and fast initial transitions. Choosing :math:`D`
 625 too small can lead to slow initial convergence. It may be a good idea to
 626 run a short trial simulation and after the first covering check the
 627 maximum free energy difference of the PMF estimate. If this is much
 628 larger than the expected magnitude of the free energy barriers that
 629 should be crossed, then the system is probably being pulled too hard and
 630 :math:`D` should be decreased. An accurate estimate of the diffusion
 631 can be obtaining from an AWH simulation with the :ref:`gmx awh` tool.
 632 :math:`\varepsilon_0` on the other hand, should be a rough estimate
 633 of the initial error.
 634
 635 Tips for efficient sampling
 636 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 637
 638 The force constant :math:`k` should be larger than the curvature of the
 639 PMF landscape. If this is not the case, the distributions of the
 640 reaction coordinate :math:`\xi` and the reference coordinate
 641 :math:`\lambda`, will differ significantly and warnings will be printed
 642 in the log file. One can choose :math:`k` as large as the time step
 643 supports. This will neccessarily increase the number of points of the
 644 discretized sampling interval :math:`I`. In general however, it should
 645 not affect the performance of the simulation noticeably because the AWH
 646 update is implemented such that only sampled points are accessed at free
 647 energy update time.
 648
 649 As with any method, the choice of reaction coordinate(s) is critical. If
 650 a single reaction coordinate does not suffice, identifying a second
 651 reaction coordinate and sampling the two-dimensional landscape may help.
 652 In this case, using a target distribution with a free energy cutoff (see
 653 :eq:`Eq. %s <eqawhrhocut>`) might be required to avoid
 654 sampling uninteresting regions of very high free energy. Obtaining
 655 accurate free energies for reaction coordinates of much higher
 656 dimensionality than 3 or possibly 4 is generally not feasible.
 657
 658 Monitoring the transition rate of :math:`\xi(t)`, across the sampling
 659 interval is also advisable. For reliable statistics (e.g. when
 660 reweighting the trajectory as described in section :ref:`awhreweight`),
 661 one would generally want to observe at least a few transitions after
 662 having exited the initial stage. Furthermore, if the dynamics of the
 663 reaction coordinate suddenly changes, this may be a sign of e.g. a
 664 reaction coordinate problem.
 665
 666 Difficult regions of sampling may also be detected by calculating the
 667 friction tensor :math:`\eta_{\mu\nu}(\lambda)` in the sampling interval,
 668 see section :ref:`awhfriction`. :math:`\eta_{\mu\nu}(\lambda)` as well
 669 as the sampling efficiency measure :math:`\eta^{\frac{1}{2}}(\lambda)`
 670 (:eq:`Eq. %s <eqawhsqrtmetric>`) are written to the energy file and can be
 671 extracted with :ref:`gmx awh`. A high peak in
 672 :math:`\eta^{\frac{1}{2}}(\lambda)` indicates that this region requires
 673 longer time to sample properly.