docs/manual/install.tex

   1 %
   2 % This file is part of the GROMACS molecular simulation package.
   3 %
   4 % Copyright (c) 2013,2014, by the GROMACS development team, led by
   5 % Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
   6 % and including many others, as listed in the AUTHORS file in the
   7 % top-level source directory and at http://www.gromacs.org.
   8 %
   9 % GROMACS is free software; you can redistribute it and/or
  10 % modify it under the terms of the GNU Lesser General Public License
  11 % as published by the Free Software Foundation; either version 2.1
  12 % of the License, or (at your option) any later version.
  13 %
  14 % GROMACS is distributed in the hope that it will be useful,
  15 % but WITHOUT ANY WARRANTY; without even the implied warranty of
  16 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  17 % Lesser General Public License for more details.
  18 %
  19 % You should have received a copy of the GNU Lesser General Public
  20 % License along with GROMACS; if not, see
  21 % http://www.gnu.org/licenses, or write to the Free Software Foundation,
  22 % Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA.
  23 %
  24 % If you want to redistribute modifications to GROMACS, please
  25 % consider that scientific software is very special. Version
  26 % control is crucial - bugs must be traceable. We will be happy to
  27 % consider code for inclusion in the official distribution, but
  28 % derived work must not be called official GROMACS. Details are found
  29 % in the README & COPYING files - if they are missing, get the
  30 % official version at http://www.gromacs.org.
  31 %
  32 % To help us fund GROMACS development, we humbly ask that you cite
  33 % the research papers on the package. Check out http://www.gromacs.org.
  34
  35 \chapter{Technical Details}
  36
  37 \section{Mixed or Double precision}
  38 {\gromacs} can be compiled in either mixed\index{mixed
  39 precision|see{precision, mixed}}\index{precision, mixed} or
  40 \pawsindex{double}{precision}. Documentation of previous {\gromacs}
  41 versions referred to ``single precision'', but the implementation
  42 has made selective use of double precision for many years.
  43 Using single precision
  44 for all variables would lead to a significant reduction in accuracy.
  45 Although in ``mixed precision'' all state vectors, i.e. particle coordinates,
  46 velocities and forces, are stored in single precision, critical variables
  47 are double precision. A typical example of the latter is the virial,
  48 which is a sum over all forces in the system, which have varying signs.
  49 In addition, in many parts of the code we managed to avoid double precision
  50 for arithmetic, by paying attention to summation order or reorganization
  51 of mathematical expressions. The default configuration uses mixed precision,
  52 but it is easy to turn on double precision by adding the option
  53 {\tt -DGMX_DOUBLE=on} to {\tt cmake}. Double precision
  54 will be 20 to 100\% slower than mixed precision depending on the
  55 architecture you are running on. Double precision will use somewhat
  56 more memory and run input, energy and full-precision trajectory files
  57 will be almost twice as large.
  58
  59 The energies in mixed precision are accurate up to the last decimal,
  60 the last one or two decimals of the forces are non-significant.
  61 The virial is less accurate than the forces, since the virial is only one
  62 order of magnitude larger than the size of each element in the sum over
  63 all atoms (\secref{virial}).
  64 In most cases this is not really a problem, since the fluctuations in the
  65 virial can be two orders of magnitude larger than the average.
  66 Using cut-offs for the Coulomb interactions cause large errors
  67 in the energies, forces, and virial.
  68 Even when using a reaction-field or lattice sum method, the errors
  69 are larger than, or comparable to, the errors due to the partial use of
  70 single precision.
  71 Since MD is chaotic, trajectories with very similar starting conditions will
  72 diverge rapidly, the divergence is faster in mixed precision than in double
  73 precision.
  74
  75 For most simulations, mixed precision is accurate enough.
  76 In some cases double precision is required to get reasonable results:
  77 \begin{itemize}
  78 \item normal mode analysis,
  79 for the conjugate gradient or l-bfgs minimization and the calculation and
  80 diagonalization of the Hessian
  81 \item long-term energy conservation, especially for large systems
  82 \end{itemize}
  83
  84 \section{Environment Variables}
  85 {\gromacs} programs may be influenced by the use of
  86 \normindex{environment variables}.  First of all, the variables set in
  87 the {\tt \normindex{GMXRC}} file are essential for running and
  88 compiling {\gromacs}. Some other useful environment variables are
  89 listed in the following sections. Most environment variables function
  90 by being set in your shell to any non-NULL value. Specific
  91 requirements are described below if other values need to be set. You
  92 should consult the documentation for your shell for instructions on
  93 how to set environment variables in the current shell, or in config
  94 files for future shells. Note that requirements for exporting
  95 environment variables to jobs run under batch control systems vary and
  96 you should consult your local documentation for details.
  97
  98 {\bf Output Control}
  99
 100 \begin{enumerate}
 101
 102 \item   {\tt GMX_CONSTRAINTVIR}: print constraint virial and force virial energy terms.
 103 \item   {\tt GMX_MAXBACKUP}: {\gromacs} automatically backs up old
 104         copies of files when trying to write a new file of the same
 105         name, and this variable controls the maximum number of
 106         backups that will be made, default 99. If set to 0 it fails to
 107         run if any output file already exists. And if set to -1 it
 108         overwrites any output file without making a backup.
 109 \item   {\tt GMX_NO_QUOTES}: if this is explicitly set, no cool quotes
 110         will be printed at the end of a program.
 111 \item   {\tt GMX_SUPPRESS_DUMP}: prevent dumping of step files during
 112         (for example) blowing up during failure of constraint
 113         algorithms.
 114 \item   {\tt GMX_TPI_DUMP}: dump all configurations to a {\tt .pdb}
 115         file that have an interaction energy less than the value set
 116         in this environment variable.
 117 \item   {\tt GMX_VIEW_XPM}: {\tt GMX_VIEW_XVG}, {\tt
 118         GMX_VIEW_EPS} and {\tt GMX_VIEW_PDB}, commands used to
 119         automatically view \@ {\tt .xvg}, {\tt .xpm}, {\tt .eps}
 120         and {\tt .pdb} file types, respectively; they default to {\tt xv}, {\tt xmgrace},
 121         {\tt ghostview} and {\tt rasmol}. Set to empty to disable
 122         automatic viewing of a particular file type. The command will
 123         be forked off and run in the background at the same priority
 124         as the {\gromacs} tool (which might not be what you want).
 125         Be careful not to use a command which blocks the terminal
 126         ({\eg} {\tt vi}), since multiple instances might be run.
 127 \item   {\tt GMX_VIRIAL_TEMPERATURE}: print virial temperature energy term
 128 \item   {\tt GMX_LOG_BUFFER}: the size of the buffer for file I/O. When set
 129         to 0, all file I/O will be unbuffered and therefore very slow.
 130         This can be handy for debugging purposes, because it ensures
 131         that all files are always totally up-to-date.
 132 \item   {\tt GMX_LOGO_COLOR}: set display color for logo in {\tt \normindex{ngmx}}.
 133 \item   {\tt GMX_PRINT_LONGFORMAT}: use long float format when printing
 134         decimal values.
 135 \item   {\tt GMX_COMPELDUMP}: Applies for computational electrophysiology setups
 136         only (see section \ref{sec:compel}). The initial structure gets dumped to
 137         {\tt .pdb} file, which allows to check whether multimeric channels have
 138         the correct PBC representation.
 139 \end{enumerate}
 140
 141
 142 {\bf Debugging}
 143
 144 \begin{enumerate}
 145
 146 \item   {\tt GMX_PRINT_DEBUG_LINES}: when set, print debugging info on line numbers.
 147 \item   {\tt GMX_DD_NST_DUMP}: number of steps that elapse between dumping
 148         the current DD to a PDB file (default 0). This only takes effect
 149         during domain decomposition, so it should typically be
 150         0 (never), 1 (every DD phase) or a multiple of {\tt nstlist}.
 151 \item   {\tt GMX_DD_NST_DUMP_GRID}: number of steps that elapse between dumping
 152         the current DD grid to a PDB file (default 0). This only takes effect
 153         during domain decomposition, so it should typically be
 154         0 (never), 1 (every DD phase) or a multiple of {\tt nstlist}.
 155 \item   {\tt GMX_DD_DEBUG}: general debugging trigger for every domain
 156         decomposition (default 0, meaning off). Currently only checks
 157         global-local atom index mapping for consistency.
 158 \item   {\tt GMX_DD_NPULSE}: over-ride the number of DD pulses used
 159         (default 0, meaning no over-ride). Normally 1 or 2.
 160
 161 %\item   There are a number of extra environment variables like these
 162 %        that are used in debugging - check the code!
 163
 164 \end{enumerate}
 165
 166 {\bf Performance and Run Control}
 167
 168 \begin{enumerate}
 169
 170 \item   {\tt GMX_DO_GALACTIC_DYNAMICS}: planetary simulations are made possible (just for fun) by setting
 171         this environment variable, which allows setting {\tt epsilon_r = -1} in the {\tt .mdp}
 172         file. Normally, {\tt epsilon_r} must be greater than zero to prevent a fatal error.
 173         See {\wwwpage} for example input files for a planetary simulation.
 174 \item   {\tt GMX_ALLOW_CPT_MISMATCH}: when set, runs will not exit if the
 175         ensemble set in the {\tt .tpr} file does not match that of the
 176         {\tt .cpt} file.
 177 \item   {\tt GMX_CUDA_NB_EWALD_TWINCUT}: force the use of twin-range cutoff kernel even if {\tt rvdw} =
 178         {\tt rcoulomb} after PP-PME load balancing. The switch to twin-range kernels is automated,
 179         so this variable should be used only for benchmarking.
 180 \item   {\tt GMX_CUDA_NB_ANA_EWALD}: force the use of analytical Ewald kernels. Should be used only for benchmarking.
 181 \item   {\tt GMX_CUDA_NB_TAB_EWALD}: force the use of tabulated Ewald kernels. Should be used only for benchmarking.
 182 \item   {\tt GMX_CUDA_STREAMSYNC}: force the use of cudaStreamSynchronize on ECC-enabled GPUs, which leads
 183         to performance loss due to a known CUDA driver bug present in API v5.0 NVIDIA drivers (pre-30x.xx).
 184         Cannot be set simultaneously with {\tt GMX_NO_CUDA_STREAMSYNC}.
 185 \item   {\tt GMX_CYCLE_ALL}: times all code during runs.  Incompatible with threads.
 186 \item   {\tt GMX_CYCLE_BARRIER}: calls MPI_Barrier before each cycle start/stop call.
 187 \item   {\tt GMX_DD_ORDER_ZYX}: build domain decomposition cells in the order
 188         (z, y, x) rather than the default (x, y, z).
 189 \item   {\tt GMX_DD_USE_SENDRECV2}: during constraint and vsite communication, use a pair
 190         of {\tt MPI_SendRecv} calls instead of two simultaneous non-blocking calls
 191         (default 0, meaning off). Might be faster on some MPI implementations.
 192 \item   {\tt GMX_DLB_BASED_ON_FLOPS}: do domain-decomposition dynamic load balancing based on flop count rather than
 193         measured time elapsed (default 0, meaning off).
 194         This makes the load balancing reproducible, which can be useful for debugging purposes.
 195         A value of 1 uses the flops; a value > 1 adds (value - 1)*5\% of noise to the flops to increase the imbalance and the scaling.
 196 \item   {\tt GMX_DLB_MAX_BOX_SCALING}: maximum percentage box scaling permitted per domain-decomposition
 197         load-balancing step (default 10)
 198 \item   {\tt GMX_DD_RECORD_LOAD}: record DD load statistics for reporting at end of the run (default 1, meaning on)
 199 \item   {\tt GMX_DD_NST_SORT_CHARGE_GROUPS}: number of steps that elapse between re-sorting of the charge
 200         groups (default 1). This only takes effect during domain decomposition, so should typically
 201         be 0 (never), 1 (to mean at every domain decomposition), or a multiple of {\tt nstlist}.
 202 \item   {\tt GMX_DETAILED_PERF_STATS}: when set, print slightly more detailed performance information
 203         to the {\tt .log} file. The resulting output is the way performance summary is reported in versions
 204         4.5.x and thus may be useful for anyone using scripts to parse {\tt .log} files or standard output.
 205 \item   {\tt GMX_DISABLE_SIMD_KERNELS}: disables architecture-specific SIMD-optimized (SSE2, SSE4.1, AVX, etc.)
 206         non-bonded kernels thus forcing the use of plain C kernels.
 207 \item   {\tt GMX_DISABLE_CUDA_TIMING}: timing of asynchronously executed GPU operations can have a
 208         non-negligible overhead with short step times. Disabling timing can improve performance in these cases.
 209 \item   {\tt GMX_DISABLE_GPU_DETECTION}: when set, disables GPU detection even if {\tt \normindex{mdrun}} was compiled
 210         with GPU support.
 211 \item   {\tt GMX_DISABLE_PINHT}: disable pinning of consecutive threads to physical cores when using
 212         Intel hyperthreading. Controlled with {\tt \normindex{mdrun} -nopinht} and thus this environment
 213         variable will likely be removed.
 214 \item   {\tt GMX_DISRE_ENSEMBLE_SIZE}: the number of systems for distance restraint ensemble
 215         averaging. Takes an integer value.
 216 \item   {\tt GMX_EMULATE_GPU}: emulate GPU runs by using algorithmically equivalent CPU reference code instead of
 217         GPU-accelerated functions. As the CPU code is slow, it is intended to be used only for debugging purposes.
 218         The behavior is automatically triggered if non-bonded calculations are turned off using {\tt GMX_NO_NONBONDED}
 219         case in which the non-bonded calculations will not be called, but the CPU-GPU transfer will also be skipped.
 220 \item   {\tt GMX_ENX_NO_FATAL}: disable exiting upon encountering a corrupted frame in an {\tt .edr}
 221         file, allowing the use of all frames up until the corruption.
 222 \item   {\tt GMX_FORCE_UPDATE}: update forces when invoking {\tt \normindex{mdrun} -rerun}.
 223 \item   {\tt GMX_GPU_ID}: set in the same way as the {\tt \normindex{mdrun}} option {\tt -gpu_id}, {\tt GMX_GPU_ID}
 224         allows the user to specify different GPU id-s, which can be useful for selecting different
 225         devices on different compute nodes in a cluster.  Cannot be used in conjunction with {\tt -gpu_id}.
 226 \item   {\tt GMX_IGNORE_FSYNC_FAILURE_ENV}: allow {\tt \normindex{mdrun}} to continue even if
 227         a file is missing.
 228 \item   {\tt GMX_LJCOMB_TOL}: when set to a floating-point value, overrides the default tolerance of
 229         1e-5 for force-field floating-point parameters.
 230 \item   {\tt GMX_MAX_MPI_THREADS}: sets the maximum number of MPI-threads that {\tt \normindex{mdrun}}
 231         can use.
 232 \item   {\tt GMX_MAXCONSTRWARN}: if set to -1, {\tt \normindex{mdrun}} will
 233         not exit if it produces too many LINCS warnings.
 234 \item   {\tt GMX_NB_GENERIC}: use the generic C kernel.  Should be set if using
 235         the group-based cutoff scheme and also sets {\tt GMX_NO_SOLV_OPT} to be true,
 236         thus disabling solvent optimizations as well.
 237 \item   {\tt GMX_NB_MIN_CI}: neighbor list balancing parameter used when running on GPU. Sets the
 238         target minimum number pair-lists in order to improve multi-processor load-balance for better
 239         performance with small simulation systems. Must be set to a positive integer, the default value
 240         is optimized for NVIDIA Fermi and Kepler GPUs, therefore changing it is not necessary for
 241         normal usage, but it can be useful on future architectures.
 242 \item   {\tt GMX_NBLISTCG}: use neighbor list and kernels based on charge groups.
 243 \item   {\tt GMX_NBNXN_CYCLE}: when set, print detailed neighbor search cycle counting.
 244 \item   {\tt GMX_NBNXN_EWALD_ANALYTICAL}: force the use of analytical Ewald non-bonded kernels,
 245         mutually exclusive of {\tt GMX_NBNXN_EWALD_TABLE}.
 246 \item   {\tt GMX_NBNXN_EWALD_TABLE}: force the use of tabulated Ewald non-bonded kernels,
 247         mutually exclusive of {\tt GMX_NBNXN_EWALD_ANALYTICAL}.
 248 \item   {\tt GMX_NBNXN_SIMD_2XNN}: force the use of 2x(N+N) SIMD CPU non-bonded kernels,
 249         mutually exclusive of {\tt GMX_NBNXN_SIMD_4XN}.
 250 \item   {\tt GMX_NBNXN_SIMD_4XN}: force the use of 4xN SIMD CPU non-bonded kernels,
 251         mutually exclusive of {\tt GMX_NBNXN_SIMD_2XNN}.
 252 \item   {\tt GMX_NO_ALLVSALL}: disables optimized all-vs-all kernels.
 253 \item   {\tt GMX_NO_CART_REORDER}: used in initializing domain decomposition communicators. Rank reordering
 254         is default, but can be switched off with this environment variable.
 255 \item   {\tt GMX_NO_CUDA_STREAMSYNC}: the opposite of {\tt GMX_CUDA_STREAMSYNC}. Disables the use of the
 256         standard cudaStreamSynchronize-based GPU waiting to improve performance when using CUDA driver API
 257         ealier than v5.0 with ECC-enabled GPUs.
 258 \item   {\tt GMX_NO_INT}, {\tt GMX_NO_TERM}, {\tt GMX_NO_USR1}: disable signal handlers for SIGINT,
 259         SIGTERM, and SIGUSR1, respectively.
 260 \item   {\tt GMX_NO_NODECOMM}: do not use separate inter- and intra-node communicators.
 261 \item   {\tt GMX_NO_NONBONDED}: skip non-bonded calculations; can be used to estimate the possible
 262         performance gain from adding a GPU accelerator to the current hardware setup -- assuming that this is
 263         fast enough to complete the non-bonded calculations while the CPU does bonded force and PME computation.
 264 \item   {\tt GMX_NO_PULLVIR}: when set, do not add virial contribution to COM pull forces.
 265 \item   {\tt GMX_NOCHARGEGROUPS}: disables multi-atom charge groups, {\ie} each atom
 266         in all non-solvent molecules is assigned its own charge group.
 267 \item   {\tt GMX_NOPREDICT}: shell positions are not predicted.
 268 \item   {\tt GMX_NO_SOLV_OPT}: turns off solvent optimizations; automatic if {\tt GMX_NB_GENERIC}
 269         is enabled.
 270 \item   {\tt GMX_NSCELL_NCG}: the ideal number of charge groups per neighbor searching grid cell is hard-coded
 271         to a value of 10. Setting this environment variable to any other integer value overrides this hard-coded
 272         value.
 273 \item   {\tt GMX_PME_NTHREADS}: set the number of OpenMP or PME threads (overrides the number guessed by
 274         {\tt \normindex{mdrun}}.
 275 \item   {\tt GMX_PME_P3M}: use P3M-optimized influence function instead of smooth PME B-spline interpolation.
 276 \item   {\tt GMX_PME_THREAD_DIVISION}: PME thread division in the format ``x y z'' for all three dimensions. The
 277         sum of the threads in each dimension must equal the total number of PME threads (set in
 278         {\tt GMX_PME_NTHREADS}).
 279 \item   {\tt GMX_PMEONEDD}: if the number of domain decomposition cells is set to 1 for both x and y,
 280         decompose PME in one dimension.
 281 \item   {\tt GMX_REQUIRE_SHELL_INIT}: require that shell positions are initiated.
 282 \item   {\tt GMX_REQUIRE_TABLES}: require the use of tabulated Coulombic
 283         and van der Waals interactions.
 284 \item   {\tt GMX_SCSIGMA_MIN}: the minimum value for soft-core $\sigma$. {\bf Note} that this value is set
 285         using the {\tt sc-sigma} keyword in the {\tt .mdp} file, but this environment variable can be used
 286         to reproduce pre-4.5 behavior with respect to this parameter.
 287 \item   {\tt GMX_TPIC_MASSES}: should contain multiple masses used for test particle insertion into a cavity.
 288         The center of mass of the last atoms is used for insertion into the cavity.
 289 \item   {\tt GMX_USE_GRAPH}: use graph for bonded interactions.
 290 \item   {\tt GMX_VERLET_BUFFER_RES}: resolution of buffer size in Verlet cutoff scheme.  The default value is
 291         0.001, but can be overridden with this environment variable.
 292 \item   {\tt GMX_VERLET_SCHEME}: convert from group-based to Verlet cutoff scheme, even if the {\tt cutoff_scheme} is
 293         not set to use Verlet in the {\tt .mdp} file. It is unnecessary since the {\tt -testverlet} option of
 294         {\tt \normindex{mdrun}} has the same functionality, but it is maintained for backwards compatibility.
 295 \item   {\tt MPIRUN}: the {\tt mpirun} command used by {\tt \normindex{g_tune_pme}}.
 296 \item   {\tt MDRUN}: the {\tt \normindex{mdrun}} command used by {\tt \normindex{g_tune_pme}}.
 297 \item   {\tt GMX_NSTLIST}: sets the default value for {\tt nstlist}, preventing it from being tuned during
 298         {\tt \normindex{mdrun}} startup when using the Verlet cutoff scheme.
 299 \item   {\tt GMX_USE_TREEREDUCE}: use tree reduction for nbnxn force reduction. Potentially faster for large number of
 300         OpenMP threads (if memory locality is important).
 301
 302 \end{enumerate}
 303
 304 {\bf Analysis and Core Functions}
 305
 306 \begin{enumerate}
 307
 308 \item   {\tt GMX_QM_ACCURACY}: accuracy in Gaussian L510 (MC-SCF) component program.
 309 \item   {\tt GMX_QM_ORCA_BASENAME}: prefix of {\tt .tpr} files, used in Orca calculations
 310         for input and output file names.
 311 \item   {\tt GMX_QM_CPMCSCF}: when set to a nonzero value, Gaussian QM calculations will
 312         iteratively solve the CP-MCSCF equations.
 313 \item   {\tt GMX_QM_MODIFIED_LINKS_DIR}: location of modified links in Gaussian.
 314 \item   {\tt DSSP}: used by {\tt \normindex{do_dssp}} to point to the {\tt dssp}
 315         executable (not just its path).
 316 \item   {\tt GMX_QM_GAUSS_DIR}: directory where Gaussian is installed.
 317 \item   {\tt GMX_QM_GAUSS_EXE}: name of the Gaussian executable.
 318 \item   {\tt GMX_DIPOLE_SPACING}: spacing used by {\tt \normindex{g_dipoles}}.
 319 \item   {\tt GMX_MAXRESRENUM}: sets the maximum number of residues to be renumbered by
 320         {\tt \normindex{grompp}}. A value of -1 indicates all residues should be renumbered.
 321 \item   {\tt GMX_FFRTP_TER_RENAME}: Some force fields (like AMBER) use specific names for N- and C-
 322         terminal residues (NXXX and CXXX) as {\tt .rtp} entries that are normally renamed. Setting
 323         this environment variable disables this renaming.
 324 \item   {\tt GMX_PATH_GZIP}: {\tt gunzip} executable, used by {\tt \normindex{g_wham}}.
 325 \item   {\tt GMX_FONT}: name of X11 font used by {\tt \normindex{ngmx}}.
 326 \item   {\tt GMXTIMEUNIT}: the time unit used in output files, can be
 327         anything in fs, ps, ns, us, ms, s, m or h.
 328 \item   {\tt GMX_QM_GAUSSIAN_MEMORY}: memory used for Gaussian QM calculation.
 329 \item   {\tt MULTIPROT}: name of the {\tt multiprot} executable, used by the
 330         contributed program {\tt \normindex{do_multiprot}}.
 331 \item   {\tt NCPUS}: number of CPUs to be used for Gaussian QM calculation
 332 \item   {\tt GMX_ORCA_PATH}: directory where Orca is installed.
 333 \item   {\tt GMX_QM_SA_STEP}: simulated annealing step size for Gaussian QM calculation.
 334 \item   {\tt GMX_QM_GROUND_STATE}: defines state for Gaussian surface hopping calculation.
 335 \item   {\tt GMX_TOTAL}: name of the {\tt total} executable used by the contributed
 336         {\tt \normindex{do_shift}} program.
 337 \item   {\tt GMX_ENER_VERBOSE}: make {\tt \normindex{g_energy}} and {\tt \normindex{eneconv}}
 338         loud and noisy.
 339 \item   {\tt VMD_PLUGIN_PATH}: where to find VMD plug-ins. Needed to be
 340         able to read file formats recognized only by a VMD plug-in.
 341 \item   {\tt VMDDIR}: base path of VMD installation.
 342 \item   {\tt GMX_USE_XMGR}: sets viewer to {\tt xmgr} (deprecated) instead of {\tt xmgrace}.
 343
 344 \end{enumerate}
 345
 346 \section{Running {\gromacs} in parallel}
 347 By default {\gromacs} will be compiled with the built-in thread-MPI library.
 348 This library handles communication between threads on a single
 349 node more efficiently than using an external MPI library.
 350 To run {\gromacs} in parallel over multiple nodes, e.g. on a cluster,
 351 you need to configure and compile {\gromacs} with an external
 352 MPI library. All supercomputers are shipped with MPI libraries optimized for
 353 that particular platform, and there are several good free MPI
 354 implementations; OpenMPI is usually a good choice.
 355 Note that MPI and thread-MPI support are mutually incompatible.
 356
 357 In addition to MPI parallelization, {\gromacs} supports also
 358 thread-parallelization through \normindex{OpenMP}. MPI and OpenMP parallelization
 359 can be combined, which results in, so called, hybrid parallelization. It can offer
 360 better performance and scaling in some cases.
 361
 362 See {\wwwpage} for details on the use and performance of the different
 363 parallelization schemes.
 364
 365 \section{Running {\gromacs} on \normindex{GPUs}}
 366 As of version 4.6, {\gromacs} has native GPU support through CUDA.
 367 Note that {\gromacs} only off-loads the most compute intensive parts
 368 to the GPU, currently the non-bonded interactions, and does all other
 369 parts of the MD calculation on the CPU. The requirements for the CUDA code
 370 are an Nvidia GPU with compute capability $\geq 2.0$, i.e. at
 371 least Fermi class.
 372 In many cases {\tt cmake} can auto-detect GPUs and the support will be
 373 configured automatically. To be sure GPU support is configured, pass
 374 the {\tt -DGMX_GPU=on} option to {\tt cmake}. The actual use of GPUs
 375 is decided at run time by {\tt mdrun}, depending on the availability
 376 of (suitable) GPUs and on the run input settings. A binary compiled
 377 with GPU support can also run CPU only simulations. Use {\tt mdrun -nb cpu}
 378 to force a simulation to run on CPUs only. Only simulations with the Verlet
 379 cut-off scheme will run on a GPU. To test performance of old tpr files
 380 with GPUs, you can use the {\tt -testverlet} option of {\tt mdrun},
 381 but as this doesn't do the full parameter consistency check of {\tt grommp},
 382 you should not use this option for production simulations.
 383 Getting good performance with {\gromacs} on GPUs is easy,
 384 but getting best performance can be difficult.
 385 Please check {\wwwpage} for up to date information on GPU usage.
 386
 387 % LocalWords:  Opteron Itanium PowerPC Altivec Athlon Fortran virial bfgs Nasm
 388 % LocalWords:  diagonalization Cygwin MPI Multi GMXHOME extern gmx tx pid buf
 389 % LocalWords:  bufsize txs rx rxs init nprocs fp msg GMXRC DUMPNL BUFS GMXNPRI
 390 % LocalWords:  unbuffered SGI npri mdrun covar nmeig setenv XPM XVG EPS
 391 % LocalWords:  PDB xvg xpm eps pdb xmgrace ghostview rasmol GMXTIMEUNIT fs dssp
 392 % LocalWords:  mpi distclean ing mpirun goofus doofus fred topol np
 393 % LocalWords:  internet gromacs DGMX cmake SIMD intrinsics AVX PME XN
 394 % LocalWords:  Verlet pre config CONSTRAINTVIR MAXBACKUP TPI ngmx mdp
 395 % LocalWords:  LONGFORMAT DISTGCT CPT tpr cpt CUDA EWALD TWINCUT rvdw
 396 % LocalWords:  rcoulomb STREAMSYNC cudaStreamSynchronized ECC GPUs sc
 397 % LocalWords:  ZYX PERF GPU PINHT hyperthreading DISRE NONBONDED ENX
 398 % LocalWords:  edr ENER gpu FSYNC ENV LJCOMB TOL MAXCONSTRWARN LINCS
 399 % LocalWords:  SOLV NBLISTCG NBNXN XNN ALLVSALL cudaStreamSynchronize
 400 % LocalWords:  USR SIGINT SIGTERM SIGUSR NODECOMM intra PULLVIR multi
 401 % LocalWords:  NOCHARGEGROUPS NOPREDICT NSCELL NCG NTHREADS OpenMP CP
 402 % LocalWords:  PMEONEDD Coulombic der Waals SCSIGMA TPIC GMXNPRIALL
 403 % LocalWords:  GOMP KMP pme NSTLIST ENVVAR nstlist startup OMP NUM ps
 404 % LocalWords:  ACC SCF BASENAME Orca CPMCSCF MCSCF DEVEL EXE GKRWIDTH
 405 % LocalWords:  MAXRESRENUM grompp FFRTP TER NXXX CXXX rtp GZIP gunzip
 406 % LocalWords:  GMXFONT ns MEM MULTIPROT multiprot NCPUS CPUs OPENMM
 407 % LocalWords:  PLUGIN OpenMM plugins SASTEP TESTMC eneconv VMD VMDDIR
 408 % LocalWords:  GMX_USE_XMGR xmgr parallelization nt online Nvidia nb cpu
 409 % LocalWords:  testverlet grommp