manual/averages.tex

   1 %
   2 % This file is part of the GROMACS molecular simulation package.
   3 %
   4 % Copyright (c) 2013, by the GROMACS development team, led by
   5 % Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
   6 % and including many others, as listed in the AUTHORS file in the
   7 % top-level source directory and at http://www.gromacs.org.
   8 %
   9 % GROMACS is free software; you can redistribute it and/or
  10 % modify it under the terms of the GNU Lesser General Public License
  11 % as published by the Free Software Foundation; either version 2.1
  12 % of the License, or (at your option) any later version.
  13 %
  14 % GROMACS is distributed in the hope that it will be useful,
  15 % but WITHOUT ANY WARRANTY; without even the implied warranty of
  16 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  17 % Lesser General Public License for more details.
  18 %
  19 % You should have received a copy of the GNU Lesser General Public
  20 % License along with GROMACS; if not, see
  21 % http://www.gnu.org/licenses, or write to the Free Software Foundation,
  22 % Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA.
  23 %
  24 % If you want to redistribute modifications to GROMACS, please
  25 % consider that scientific software is very special. Version
  26 % control is crucial - bugs must be traceable. We will be happy to
  27 % consider code for inclusion in the official distribution, but
  28 % derived work must not be called official GROMACS. Details are found
  29 % in the README & COPYING files - if they are missing, get the
  30 % official version at http://www.gromacs.org.
  31 %
  32 % To help us fund GROMACS development, we humbly ask that you cite
  33 % the research papers on the package. Check out http://www.gromacs.org.
  34
  35 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  36 %
  37 %       AVERAGES AND FLUCTUATIONS
  38 %
  39 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  40 \chapter{Averages and fluctuations}
  41 \section{Formulae for averaging}
  42 {\bf Note:} this section was taken from ref~\cite{Gunsteren94a}.
  43
  44 When analyzing a MD trajectory averages $\left<x\right>$ and fluctuations
  45 \beq
  46  \left<(\Delta x)^2\right>^{\half} ~=~ \left<[x-\left<x\right>]^2\right>^{\half}
  47 \label{eqn:var0}
  48 \eeq
  49 of a quantity $x$ are to be computed.
  50 The variance $\sigma_x$ of a series of N$_x$ values,
  51 \{x$_i$\}, can be computed from
  52 \beq
  53 \sigma_x~=~ \sum_{i=1}^{N_x} x_i^2 ~-~  \frac{1}{N_x}\left(\sum_{i=1}^{N_x}x_i\right)^2
  54 \label{eqn:var1}
  55 \eeq
  56 Unfortunately this formula is numerically not very accurate,
  57 especially when $\sigma_x^{\half}$ is small compared to the values of $x_i$.
  58 The following (equivalent) expression is numerically more accurate
  59 \beq
  60 \sigma_x ~=~ \sum_{i=1}^{N_x} [x_i  - \left<x\right>]^2
  61 \eeq
  62 with
  63 \beq
  64   \left<x\right> ~=~ \frac{1}{N_x} \sum_{i=1}^{N_x} x_i
  65 \label{eqn:var2}
  66 \eeq
  67 Using ~\eqnsref{var1}{var2} one has to go
  68 through the series of $x_i$ values twice, once to determine
  69 $\left<x\right>$ and again to
  70 compute $\sigma_x$,
  71 whereas \eqnref{var0} requires only one sequential scan of
  72 the series \{x$_i$\}. However, one may cast \eqnref{var1} in
  73 another form, containing partial sums, which allows for a sequential
  74 update algorithm. Define the partial sum
  75 \beq
  76           X_{n,m} ~=~ \sum_{i=n}^{m} x_i
  77 \eeq
  78 and the partial variance
  79 \beq
  80     \sigma_{n,m} ~=~ \sum_{i=n}^{m}  \left[x_i - \frac{X_{n,m}}{m-n+1}\right]^2
  81 \label{eqn:sigma}
  82 \eeq
  83 It can be shown that
  84 \beq
  85           X_{n,m+k} ~=~  X_{n,m} + X_{m+1,m+k}
  86 \label{eqn:Xpartial}
  87 \eeq
  88 and
  89 \bea
  90 \sigma_{n,m+k} &=& \sigma_{n,m} + \sigma_{m+1,m+k} + \left[~\frac {X_{n,m}}{m-n+1} - \frac{X_{n,m+k}}{m+k-n+1}~\right]^2~* \nonumber\\
  91    && ~\frac{(m-n+1)(m+k-n+1)}{k}
  92 \label{eqn:varpartial}
  93 \eea
  94 For $n=1$ one finds
  95 \beq
  96 \sigma_{1,m+k} ~=~ \sigma_{1,m} + \sigma_{m+1,m+k}~+~
  97   \left[~\frac{X_{1,m}}{m} - \frac{X_{1,m+k}}{m+k}~\right]^2~ \frac{m(m+k)}{k}
  98 \label{eqn:sig1}
  99 \eeq
 100 and for $n=1$ and $k=1$ ~(\eqnref{varpartial}) becomes
 101 \bea
 102 \sigma_{1,m+1}  &=& \sigma_{1,m} +
 103                         \left[\frac{X_{1,m}}{m} - \frac{X_{1,m+1}}{m+1}\right]^2 m(m+1)\\
 104                 &=& \sigma_{1,m} +
 105                         \frac {[~X_{1,m} - m x_{m+1}~]^2}{m(m+1)}
 106 \label{eqn:simplevar0}
 107 \eea
 108 where we have used the relation
 109 \beq
 110      X_{1,m+1} ~=~  X_{1,m} + x_{m+1}
 111 \label{eqn:simplevar1}
 112 \eeq
 113 Using formulae~(\eqnref{simplevar0}) and ~(\eqnref{simplevar1}) the average
 114 \beq
 115 \left<x\right> ~=~ \frac{X_{1,N_x}}{N_x}
 116 \eeq
 117 and the fluctuation
 118 \beq
 119 \left<(\Delta x)^2\right>^{\half} = \left[\frac {\sigma_{1,N_x}}{N_x}\right]^{\half}
 120 \eeq
 121 can be obtained by one sweep through the data.
 122
 123 \section{Implementation}
 124 In {\gromacs} the instantaneous
 125 energies $E(m)$ are stored in the \swapindex{energy}{file}, along with the
 126 values of $\sigma_{1,m}$ and $X_{1,m}$. Although the steps are counted from 0,
 127 for the energy and fluctuations steps are counted from 1. This means that the
 128 equations presented here are the ones that are implemented.
 129 We give somewhat lengthy derivations in this section
 130 to simplify checking of code and equations later on.
 131
 132 \subsection{Part of a Simulation}
 133 It is not uncommon to perform a simulation where the first part,
 134 {\eg} 100 ps, is taken as \normindex{equilibration}. However, the
 135 averages and fluctuations as printed in the \swapindex{log}{file}
 136 are computed over the whole simulation. The equilibration time,
 137 which is now part of the simulation, may in such a case invalidate the
 138 averages and fluctuations, because these numbers are now dominated
 139 by the initial drift towards equilibrium.
 140
 141 Using~\eqnsref{Xpartial}{varpartial} the average and
 142 standard deviation over part of the trajectory can be computed as:
 143 \bea
 144 X_{m+1,m+k}     &=& X_{1,m+k} - X_{1,m}                 \\
 145 \sigma_{m+1,m+k} &=& \sigma_{1,m+k}-\sigma_{1,m} - \left[~\frac{X_{1,m}}{m} - \frac{X_{1,m+k}}{m+k}~\right]^{2}~ \frac{m(m+k)}{k}
 146 \eea
 147
 148 or, more generally (with $p \geq 1$ and $q \geq p$):
 149 \bea
 150 X_{p,q}         &=&     X_{1,q} - X_{1,p-1}     \\
 151 \sigma_{p,q}    &=&     \sigma_{1,q}-\sigma_{1,p-1} - \left[~\frac{X_{1,p-1}}{p-1} - \frac{X_{1,q}}{q}~\right]^{2}~ \frac{(p-1)q}{q-p+1}
 152 \eea
 153 {\bf Note} that implementation of this is not entirely trivial, since energies
 154 are not stored every time step of the simulation. We therefore have to construct
 155 $X_{1,p-1}$ and $\sigma_{1,p-1}$ from the information at time $p$ using
 156 \eqnsref{simplevar0}{simplevar1}:
 157 \bea
 158 X_{1,p-1}       &=&     X_{1,p} - x_p   \\
 159 \sigma_{1,p-1}  &=&     \sigma_{1,p} -  \frac {[~X_{1,p-1} - (p-1) x_{p}~]^2}{(p-1)p}
 160 \eea
 161
 162 \subsection{Combining two simulations}
 163 Another frequently occurring problem is, that the fluctuations of two simulations
 164 must be combined. Consider the following example: we have two simulations
 165 (A) of $n$ and (B) of $m$ steps, in which the second simulation is a
 166 continuation of the first. However, the second simulation starts numbering from 1
 167 instead of from $n+1$. For the partial sum
 168 this is no problem, we have to add $X_{1,n}^A$ from run A:
 169 \beq
 170 X_{1,n+m}^{AB} ~=~ X_{1,n}^A + X_{1,m}^B
 171 \label{eqn:pscomb}
 172 \eeq
 173 When we want to compute the partial variance from the two components we have to
 174 make a correction $\Delta\sigma$:
 175 \beq
 176 \sigma_{1,n+m}^{AB} ~=~ \sigma_{1,n}^A + \sigma_{1,m}^B +\Delta\sigma
 177 \eeq
 178 if we define $x_i^{AB}$ as the combined and renumbered set of data points we can
 179 write:
 180 \beq
 181 \sigma_{1,n+m}^{AB} ~=~ \sum_{i=1}^{n+m}  \left[x_i^{AB} - \frac{X_{1,n+m}^{AB}}{n+m}\right]^2
 182 \eeq
 183 and thus
 184 \beq
 185 \sum_{i=1}^{n+m}  \left[x_i^{AB} - \frac{X_{1,n+m}^{AB}}{n+m}\right]^2  ~=~
 186 \sum_{i=1}^{n}  \left[x_i^{A} - \frac{X_{1,n}^{A}}{n}\right]^2  +
 187 \sum_{i=1}^{m}  \left[x_i^{B} - \frac{X_{1,m}^{B}}{m}\right]^2  +\Delta\sigma
 188 \eeq
 189 or
 190 \bea
 191 \sum_{i=1}^{n+m}  \left[(x_i^{AB})^2 - 2 x_i^{AB}\frac{X^{AB}_{1,n+m}}{n+m} + \left(\frac{X^{AB}_{1,n+m}}{n+m}\right)^2  \right] &-& \nonumber \\
 192 \sum_{i=1}^{n}  \left[(x_i^{A})^2 - 2 x_i^{A}\frac{X^A_{1,n}}{n} + \left(\frac{X^A_{1,n}}{n}\right)^2  \right] &-& \nonumber \\
 193 \sum_{i=1}^{m}  \left[(x_i^{B})^2 - 2 x_i^{B}\frac{X^B_{1,m}}{m} + \left(\frac{X^B_{1,m}}{m}\right)^2  \right] &=& \Delta\sigma
 194 \eea
 195 all the $x_i^2$ terms drop out, and the terms independent of the summation
 196 counter $i$ can be simplified:
 197 \bea
 198 \frac{\left(X^{AB}_{1,n+m}\right)^2}{n+m} \,-\,
 199 \frac{\left(X^A_{1,n}\right)^2}{n} \,-\,
 200 \frac{\left(X^B_{1,m}\right)^2}{m} &-& \nonumber \\
 201 2\,\frac{X^{AB}_{1,n+m}}{n+m}\sum_{i=1}^{n+m}x_i^{AB} \,+\,
 202 2\,\frac{X^{A}_{1,n}}{n}\sum_{i=1}^{n}x_i^{A} \,+\,
 203 2\,\frac{X^{B}_{1,m}}{m}\sum_{i=1}^{m}x_i^{B} &=& \Delta\sigma
 204 \eea
 205 we recognize the three partial sums on the second line and use \eqnref{pscomb}
 206 to obtain:
 207 \beq
 208 \Delta\sigma ~=~ \frac{\left(mX^A_{1,n} - nX^B_{1,m}\right)^2}{nm(n+m)}
 209 \eeq
 210 if we check this by inserting $m=1$ we get back \eqnref{simplevar0}
 211
 212 \subsection{Summing energy terms}
 213 The {\tt \normindex{g_energy}} program can also sum energy terms into one, {\eg}
 214 potential + kinetic = total. For the partial averages this is again easy
 215 if we have $S$ energy components $s$:
 216 \beq
 217 X_{m,n}^S ~=~ \sum_{i=m}^n \sum_{s=1}^S x_i^s ~=~ \sum_{s=1}^S \sum_{i=m}^n x_i^s ~=~ \sum_{s=1}^S X_{m,n}^s
 218 \label{eqn:sumterms}
 219 \eeq
 220 For the fluctuations it is less trivial again, considering for example
 221 that the fluctuation in potential and kinetic energy should cancel.
 222 Nevertheless we can try the same approach as before by writing:
 223 \beq
 224 \sigma_{m,n}^S ~=~ \sum_{s=1}^S \sigma_{m,n}^s + \Delta\sigma
 225 \eeq
 226 if we fill in \eqnref{sigma}:
 227 \beq
 228 \sum_{i=m}^n \left[\left(\sum_{s=1}^S x_i^s\right) - \frac{X_{m,n}^S}{m-n+1}\right]^2 ~=~
 229 \sum_{s=1}^S \sum_{i=m}^n \left[\left(x_i^s\right) - \frac{X_{m,n}^s}{m-n+1}\right]^2 + \Delta\sigma
 230 \label{eqn:sigmaterms}
 231 \eeq
 232 which we can expand to:
 233 \bea
 234 &~&\sum_{i=m}^n \left[\sum_{s=1}^S (x_i^s)^2 + \left(\frac{X_{m,n}^S}{m-n+1}\right)^2 -2\left(\frac{X_{m,n}^S}{m-n+1}\sum_{s=1}^S x_i^s + \sum_{s=1}^S \sum_{s'=s+1}^S x_i^s x_i^{s'} \right)\right]    \nonumber \\
 235 &-&\sum_{s=1}^S \sum_{i=m}^n \left[(x_i^s)^2 - 2\,\frac{X_{m,n}^s}{m-n+1}\,x_i^s + \left(\frac{X_{m,n}^s}{m-n+1}\right)^2\right] ~=~\Delta\sigma
 236 \eea
 237 the terms with $(x_i^s)^2$ cancel, so that we can simplify to:
 238 \bea
 239 &~&\frac{\left(X_{m,n}^S\right)^2}{m-n+1} -2 \frac{X_{m,n}^S}{m-n+1}\sum_{i=m}^n\sum_{s=1}^S x_i^s -2\sum_{i=m}^n\sum_{s=1}^S \sum_{s'=s+1}^S x_i^s x_i^{s'}\, -        \nonumber \\
 240 &~&\sum_{s=1}^S \sum_{i=m}^n \left[- 2\,\frac{X_{m,n}^s}{m-n+1}\,x_i^s + \left(\frac{X_{m,n}^s}{m-n+1}\right)^2\right] ~=~\Delta\sigma
 241 \eea
 242 or
 243 \beq
 244 -\frac{\left(X_{m,n}^S\right)^2}{m-n+1}  -2\sum_{i=m}^n\sum_{s=1}^S \sum_{s'=s+1}^S x_i^s x_i^{s'}\, +  \sum_{s=1}^S \frac{\left(X_{m,n}^s\right)^2}{m-n+1}  ~=~\Delta\sigma
 245 \eeq
 246 If we now expand the first term using \eqnref{sumterms} we obtain:
 247 \beq
 248 -\frac{\left(\sum_{s=1}^SX_{m,n}^s\right)^2}{m-n+1}  -2\sum_{i=m}^n\sum_{s=1}^S \sum_{s'=s+1}^S x_i^s x_i^{s'}\, +      \sum_{s=1}^S \frac{\left(X_{m,n}^s\right)^2}{m-n+1}  ~=~\Delta\sigma
 249 \eeq
 250 which we can reformulate to:
 251 \beq
 252 -2\left[\sum_{s=1}^S \sum_{s'=s+1}^S X_{m,n}^s X_{m,n}^{s'}\,+\sum_{i=m}^n\sum_{s=1}^S \sum_{s'=s+1}^S x_i^s x_i^{s'}\right] ~=~\Delta\sigma
 253 \eeq
 254 or
 255 \beq
 256 -2\left[\sum_{s=1}^S X_{m,n}^s \sum_{s'=s+1}^S X_{m,n}^{s'}\,+\,\sum_{s=1}^S \sum_{i=m}^nx_i^s \sum_{s'=s+1}^S x_i^{s'}\right] ~=~\Delta\sigma
 257 \eeq
 258 which gives
 259 \beq
 260 -2\sum_{s=1}^S \left[X_{m,n}^s \sum_{s'=s+1}^S \sum_{i=m}^n x_i^{s'}\,+\,\sum_{i=m}^n x_i^s \sum_{s'=s+1}^S x_i^{s'}\right] ~=~\Delta\sigma
 261 \eeq
 262 Since we need all data points $i$ to evaluate this, in general this is not possible.
 263 We can then make an estimate of $\sigma_{m,n}^S$ using only the data points that
 264 are available using the left hand side of \eqnref{sigmaterms}. While the average can
 265 be computed using all time steps in the simulation, the accuracy of the
 266 fluctuations is thus limited by the frequency with which energies are saved.
 267 Since this can be easily done with a program such as \normindex{xmgr} this is not
 268 built-in in {\gromacs}.
 269
 270 % LocalWords:  Formulae varpartial formulae simplevar Xpartial pscomb mX nX SX
 271 % LocalWords:  sumterms nx sigmaterms xmgr ps nm