+.. NOTE: Below is a useful bash one-liner to verify whether there are variables in this file
+.. no longer present in the code.
+.. ( export INPUT_FILE='docs/user-guide/environment-variables.rst' GIT_PAGER="cat "; for s in $(grep '^`' $INPUT_FILE | sed 's/`//g' | sed 's/,/ /g'); do count=$(git grep $s | grep -v $INPUT_FILE | wc -l); [ $count -eq 0 ] && printf "%-30s%s\n" $s $count; done ; )
+.. Another useful one-liner to find undocumentedvariables:
+.. ( export INPUT_FILE=docs/user-guide/environment-variables.rst; GIT_PAGER="cat "; for ss in `for s in $(git grep getenv | sed 's/.*getenv("\(.*\)".*/\1/' | sort -u | grep '^[A-Z]'); do [ $(grep $s $INPUT_FILE -c) -eq 0 ] && echo $s; done `; do git grep $ss ; done )
+
Environment Variables
=====================
Output Control
--------------
-``GMX_CONSTRAINTVIR``
- Print constraint virial and force virial energy terms.
+``GMX_DUMP_NL``
+ Neighbour list dump level; default 0.
``GMX_MAXBACKUP``
|Gromacs| automatically backs up old
Be careful not to use a command which blocks the terminal
(e.g. ``vi``), since multiple instances might be run.
-``GMX_VIRIAL_TEMPERATURE``
- print virial temperature energy term
-
``GMX_LOG_BUFFER``
the size of the buffer for file I/O. When set
to 0, all file I/O will be unbuffered and therefore very slow.
Defaults to 1, which prints frame count e.g. when reading trajectory
files. Set to 0 for quiet operation.
+``GMX_ENABLE_GPU_TIMING``
+ Enables GPU timings in the log file for CUDA and SYCL. Note that CUDA
+ timings are incorrect with multiple streams, as happens with domain
+ decomposition or with both non-bondeds and PME on the GPU (this is
+ also the main reason why they are not turned on by default).
+
+``GMX_DISABLE_GPU_TIMING``
+ Disables GPU timings in the log file for OpenCL.
+
Debugging
---------
``GMX_PRINT_DEBUG_LINES``
over-ride the number of DD pulses used
(default 0, meaning no over-ride). Normally 1 or 2.
+``GMX_DISABLE_ALTERNATING_GPU_WAIT``
+ disables the specialized polling wait path used to wait for the PME and nonbonded
+ GPU tasks completion to overlap to do the reduction of the resulting forces that
+ arrive first. Setting this variable switches to the generic path with fixed waiting
+ order.
+
+``GMX_TEST_REQUIRED_NUMBER_OF_DEVICES``
+ sets the number of GPUs required by the test suite. By default, the test suite would
+ fall-back to using CPU if GPUs could not be detected. Set it to a positive integer value
+ to ensure that at least this at least this number of usable GPUs are detected. Default:
+ 0 (not testing GPU availability).
+
There are a number of extra environment variables like these
that are used in debugging - check the code!
file. Normally, :mdp:`epsilon-r` must be greater than zero to prevent a fatal error.
See webpage_ for example input files for a planetary simulation.
-``GMX_ALLOW_CPT_MISMATCH``
- when set, runs will not exit if the
- ensemble set in the :ref:`tpr` file does not match that of the
- :ref:`cpt` file.
+``GMX_BONDED_NTHREAD_UNIFORM``
+ Value of the number of threads per rank from which to switch from uniform
+ to localized bonded interaction distribution; optimal value dependent on
+ system and hardware, default value is 4.
-``GMX_CUDA_NB_EWALD_TWINCUT``
+``GMX_GPU_NB_EWALD_TWINCUT``
force the use of twin-range cutoff kernel even if :mdp:`rvdw` equals
:mdp:`rcoulomb` after PP-PME load balancing. The switch to twin-range kernels is automated,
so this variable should be used only for benchmarking.
-``GMX_CUDA_NB_ANA_EWALD``
+``GMX_GPU_NB_ANA_EWALD``
force the use of analytical Ewald kernels. Should be used only for benchmarking.
-``GMX_CUDA_NB_TAB_EWALD``
+``GMX_GPU_NB_TAB_EWALD``
force the use of tabulated Ewald kernels. Should be used only for benchmarking.
-``GMX_CUDA_STREAMSYNC``
- force the use of cudaStreamSynchronize on ECC-enabled GPUs, which leads
- to performance loss due to a known CUDA driver bug present in API v5.0 NVIDIA drivers (pre-30x.xx).
- Cannot be set simultaneously with ``GMX_NO_CUDA_STREAMSYNC``.
+``GMX_DISABLE_CUDA_TIMING``
+ Deprecated. Use ``GMX_DISABLE_GPU_TIMING`` instead.
+
+``GMX_GPU_DD_COMMS``
+ perform domain decomposition halo exchange communication operations (on coordinate and force buffers)
+ directly on GPU memory spaces, without the staging of data through CPU memory, where possible.
-``GMX_DISABLE_CUDALAUNCH``
- disable the use of the lower-latency cudaLaunchKernel API even when supported (CUDA >=v7.0).
- Should only be used for benchmarking purposes.
+``GMX_GPU_PME_PP_COMMS``
+ when the simulation uses a separate PME rank, perform communication operations between PP and PME rank
+ (for coordinate and force buffers) directly on GPU memory spaces, without the staging of data through CPU
+ memory, where possible.
-``GMX_DISABLE_CUDA_TIMING``
- Disables GPU timing of CUDA tasks; synonymous with ``GMX_DISABLE_GPU_TIMING``.
+``GMX_GPU_SYCL_NO_SYNCHRONIZE``
+ disable synchronizations between different GPU streams in SYCL build, instead relying on SYCL runtime to
+ do scheduling based on data dependencies. Experimental.
``GMX_CYCLE_ALL``
times all code during runs. Incompatible with threads.
``GMX_DISABLE_GPU_TIMING``
timing of asynchronously executed GPU operations can have a
non-negligible overhead with short step times. Disabling timing can improve performance in these cases.
+ Timings are disabled by default with CUDA and SYCL.
``GMX_DISABLE_GPU_DETECTION``
when set, disables GPU detection even if :ref:`gmx mdrun` was compiled
``GMX_EMULATE_GPU``
emulate GPU runs by using algorithmically equivalent CPU reference code instead of
GPU-accelerated functions. As the CPU code is slow, it is intended to be used only for debugging purposes.
- The behavior is automatically triggered if non-bonded calculations are turned off using ``GMX_NO_NONBONDED``
- case in which the non-bonded calculations will not be called, but the CPU-GPU transfer will also be skipped.
``GMX_ENX_NO_FATAL``
disable exiting upon encountering a corrupted frame in an :ref:`edr`
``GMX_FORCE_UPDATE``
update forces when invoking ``mdrun -rerun``.
+``GMX_FORCE_UPDATE_DEFAULT_GPU``
+ Force update to run on the GPU by default, overriding the ``mdrun -update auto`` option. Works similar to setting
+ ``mdrun -update gpu``, but (1) falls back to the CPU code-path, if set with input that is not supported and
+ (2) can be used to run update on GPUs in multi-rank cases. The latter case should be
+ considered experimental since it lacks substantial testing. Also, GPU update is only supported with the GPU direct
+ communications and ``GMX_FORCE_UPDATE_DEFAULT_GPU`` variable should be set simultaneously with ``GMX_GPU_DD_COMMS``
+ and ``GMX_GPU_PME_PP_COMMS`` environment variables in multi-rank case. Does not override ``mdrun -update cpu``.
+
``GMX_GPU_ID``
set in the same way as ``mdrun -gpu_id``, ``GMX_GPU_ID``
- allows the user to specify different GPU id-s, which can be useful for selecting different
+ allows the user to specify different GPU IDs for different ranks, which can be useful for selecting different
devices on different compute nodes in a cluster. Cannot be used in conjunction with ``mdrun -gpu_id``.
+``GMX_GPUTASKS``
+ set in the same way as ``mdrun -gputasks``, ``GMX_GPUTASKS`` allows the mapping
+ of GPU tasks to GPU device IDs to be different on different ranks, if e.g. the MPI
+ runtime permits this variable to be different for different ranks. Cannot be used
+ in conjunction with ``mdrun -gputasks``. Has all the same requirements as ``mdrun -gputasks``.
+
+``GMX_GPU_DISABLE_COMPATIBILITY_CHECK``
+ Disables the hardware compatibility check in OpenCL and SYCL. Useful for developers
+ and allows testing the OpenCL/SYCL kernels on non-supported platforms without source code modification.
+
``GMX_IGNORE_FSYNC_FAILURE_ENV``
allow :ref:`gmx mdrun` to continue even if
a file is missing.
if set to -1, :ref:`gmx mdrun` will
not exit if it produces too many LINCS warnings.
-``GMX_NB_GENERIC``
- use the generic C kernel. Should be set if using
- the group-based cutoff scheme and also sets ``GMX_NO_SOLV_OPT`` to be true,
- thus disabling solvent optimizations as well.
-
``GMX_NB_MIN_CI``
neighbor list balancing parameter used when running on GPU. Sets the
target minimum number pair-lists in order to improve multi-processor load-balance for better
performance with small simulation systems. Must be set to a non-negative integer,
the 0 value disables list splitting.
- The default value is optimized for supported GPUs (NVIDIA Fermi to Maxwell),
+ The default value is optimized for supported GPUs
therefore changing it is not necessary for normal usage, but it can be useful on future architectures.
``GMX_NBLISTCG``
force the use of 4xN SIMD CPU non-bonded kernels,
mutually exclusive of ``GMX_NBNXN_SIMD_2XNN``.
-``GMX_NO_ALLVSALL``
- disables optimized all-vs-all kernels.
+``GMX_NOOPTIMIZEDKERNELS``
+ deprecated, use ``GMX_DISABLE_SIMD_KERNELS`` instead.
``GMX_NO_CART_REORDER``
used in initializing domain decomposition communicators. Rank reordering
force the use of LJ paremeter lookup instead of using combination rules
in the non-bonded kernels.
-``GMX_NO_CUDA_STREAMSYNC``
- the opposite of ``GMX_CUDA_STREAMSYNC``. Disables the use of the
- standard cudaStreamSynchronize-based GPU waiting to improve performance when using CUDA driver API
- ealier than v5.0 with ECC-enabled GPUs.
-
``GMX_NO_INT``, ``GMX_NO_TERM``, ``GMX_NO_USR1``
disable signal handlers for SIGINT,
SIGTERM, and SIGUSR1, respectively.
skip non-bonded calculations; can be used to estimate the possible
performance gain from adding a GPU accelerator to the current hardware setup -- assuming that this is
fast enough to complete the non-bonded calculations while the CPU does bonded force and PME computation.
+ Freezing the particles will be required to stop the system blowing up.
-``GMX_NO_PULLVIR``
- when set, do not add virial contribution to COM pull forces.
+``GMX_PULL_PARTICIPATE_ALL``
+ disable the default heuristic for when to use a separate pull MPI communicator (at >=32 ranks).
``GMX_NOPREDICT``
shell positions are not predicted.
-``GMX_NO_SOLV_OPT``
- turns off solvent optimizations; automatic if ``GMX_NB_GENERIC``
- is enabled.
+``GMX_NO_UPDATEGROUPS``
+ turns off update groups. May allow for a decomposition of more
+ domains for small systems at the cost of communication during update.
``GMX_NSCELL_NCG``
the ideal number of charge groups per neighbor searching grid cell is hard-coded
to a value of 10. Setting this environment variable to any other integer value overrides this hard-coded
value.
-``GMX_PME_NTHREADS``
- set the number of OpenMP or PME threads (overrides the number guessed by
- :ref:`gmx mdrun`.
+``GMX_PME_NUM_THREADS``
+ set the number of OpenMP or PME threads; overrides the default set by
+ :ref:`gmx mdrun`; can be used instead of the ``-npme`` command line option,
+ also useful to set heterogeneous per-process/-node thread count.
``GMX_PME_P3M``
use P3M-optimized influence function instead of smooth PME B-spline interpolation.
``GMX_PME_THREAD_DIVISION``
PME thread division in the format "x y z" for all three dimensions. The
sum of the threads in each dimension must equal the total number of PME threads (set in
- `GMX_PME_NTHREADS`).
+ :envvar:`GMX_PME_NTHREADS`).
``GMX_PMEONEDD``
if the number of domain decomposition cells is set to 1 for both x and y,
require the use of tabulated Coulombic
and van der Waals interactions.
-``GMX_SCSIGMA_MIN``
- the minimum value for soft-core sigma. **Note** that this value is set
- using the :mdp:`sc-sigma` keyword in the :ref:`mdp` file, but this environment variable can be used
- to reproduce pre-4.5 behavior with respect to this parameter.
-
``GMX_TPIC_MASSES``
should contain multiple masses used for test particle insertion into a cavity.
The center of mass of the last atoms is used for insertion into the cavity.
``HWLOC_XMLFILE``
Not strictly a |Gromacs| environment variable, but on large machines
the hwloc detection can take a few seconds if you have lots of MPI processes.
- If you run the hwloc command `lstopo out.xml` and set this environment
+ If you run the hwloc command :command:`lstopo out.xml` and set this environment
variable to point to the location of this file, the hwloc library will use
the cached information instead, which can be faster.
``MDRUN``
the :ref:`gmx mdrun` command used by :ref:`gmx tune_pme`.
-``GMX_NSTLIST``
- sets the default value for :mdp:`nstlist`, preventing it from being tuned during
- :ref:`gmx mdrun` startup when using the Verlet cutoff scheme.
+``GMX_DISABLE_DYNAMICPRUNING``
+ disables dynamic pair-list pruning. Note that :ref:`gmx mdrun` will
+ still tune nstlist to the optimal value picked assuming dynamic pruning. Thus
+ for good performance the -nstlist option should be used.
-``GMX_USE_TREEREDUCE``
- use tree reduction for nbnxn force reduction. Potentially faster for large number of
- OpenMP threads (if memory locality is important).
+``GMX_NSTLIST_DYNAMICPRUNING``
+ overrides the dynamic pair-list pruning interval chosen heuristically
+ by mdrun. Values should be between the pruning frequency value
+ (1 for CPU and 2 for GPU) and :mdp:`nstlist` ``- 1``.
.. _opencl-management:
``GMX_OCL_DISABLE_FASTMATH``
Prevents the use of ``-cl-fast-relaxed-math`` compiler option.
+ Not: fast math is always disabled on Intel devices due to instability.
``GMX_OCL_DUMP_LOG``
If defined, the OpenCL build log is always written to the
``GMX_OCL_NOGENCACHE``).
- NVIDIA GPUs: PTX code is saved in the current directory
- with the name ``device_name.ptx``
- - AMD GPUs: ``.IL/.ISA`` files will be created for each OpenCL
+ with the name ``device_name.ptx``
+ - AMD GPUs: ``.IL/.ISA`` files will be created for each OpenCL
kernel built. For details about where these files are
created check AMD documentation for ``-save-temps`` compiler
option.
simplicity of stepping in a kernel and see what is happening.
``GMX_OCL_DISABLE_I_PREFETCH``
- Disables i-atom data (type or LJ parameter) prefetch allowig
+ Disables i-atom data (type or LJ parameter) prefetch allowing
testing.
``GMX_OCL_ENABLE_I_PREFETCH``
- Enables i-atom data (type or LJ parameter) prefetch allowig
+ Enables i-atom data (type or LJ parameter) prefetch allowing
testing on platforms where this behavior is not default.
-``GMX_OCL_NB_ANA_EWALD``
- Forces the use of analytical Ewald kernels. Equivalent of
- CUDA environment variable ``GMX_CUDA_NB_ANA_EWALD``
-
-``GMX_OCL_NB_TAB_EWALD``
- Forces the use of tabulated Ewald kernel. Equivalent
- of CUDA environment variable ``GMX_OCL_NB_TAB_EWALD``
-
-``GMX_OCL_NB_EWALD_TWINCUT``
- Forces the use of twin-range cutoff kernel. Equivalent of
- CUDA environment variable ``GMX_CUDA_NB_EWALD_TWINCUT``
-
-``GMX_DISABLE_OCL_TIMING``
- Disables timing for OpenCL operations
-
``GMX_OCL_FILE_PATH``
Use this parameter to force |Gromacs| to load the OpenCL
kernels from a custom location. Use it only if you want to
override |Gromacs| default behavior, or if you want to test
your own kernels.
+``GMX_OCL_SHOW_DIAGNOSTICS``
+ Use Intel OpenCL extension to show additional runtime performance
+ diagnostics.
+
Analysis and Core Functions
---------------------------
-``GMX_QM_ACCURACY``
- accuracy in Gaussian L510 (MC-SCF) component program.
-
-``GMX_QM_ORCA_BASENAME``
- prefix of :ref:`tpr` files, used in Orca calculations
- for input and output file names.
-
-``GMX_QM_CPMCSCF``
- when set to a nonzero value, Gaussian QM calculations will
- iteratively solve the CP-MCSCF equations.
-
-``GMX_QM_MODIFIED_LINKS_DIR``
- location of modified links in Gaussian.
``DSSP``
used by :ref:`gmx do_dssp` to point to the ``dssp``
executable (not just its path).
-``GMX_QM_GAUSS_DIR``
- directory where Gaussian is installed.
-
-``GMX_QM_GAUSS_EXE``
- name of the Gaussian executable.
-
``GMX_DIPOLE_SPACING``
spacing used by :ref:`gmx dipoles`.
sets the maximum number of residues to be renumbered by
:ref:`gmx grompp`. A value of -1 indicates all residues should be renumbered.
-``GMX_FFRTP_TER_RENAME``
+``GMX_NO_FFRTP_TER_RENAME``
Some force fields (like AMBER) use specific names for N- and C-
terminal residues (NXXX and CXXX) as :ref:`rtp` entries that are normally renamed. Setting
this environment variable disables this renaming.
-``GMX_PATH_GZIP``
- ``gunzip`` executable, used by :ref:`gmx wham`.
-
``GMX_FONT``
name of X11 font used by :ref:`gmx view`.
the time unit used in output files, can be
anything in fs, ps, ns, us, ms, s, m or h.
-``GMX_QM_GAUSSIAN_MEMORY``
- memory used for Gaussian QM calculation.
-
``MULTIPROT``
name of the ``multiprot`` executable, used by the
contributed program ``do_multiprot``.
``NCPUS``
number of CPUs to be used for Gaussian QM calculation
-``GMX_ORCA_PATH``
- directory where Orca is installed.
-
-``GMX_QM_SA_STEP``
- simulated annealing step size for Gaussian QM calculation.
-
-``GMX_QM_GROUND_STATE``
- defines state for Gaussian surface hopping calculation.
-
``GMX_TOTAL``
name of the ``total`` executable used by the contributed
``do_shift`` program.