Disable fastmath with OpenCL on Intel devices

[alexxy/gromacs.git] / docs / user-guide / environment-variables.rst
diff --git a/docs/user-guide/environment-variables.rst b/docs/user-guide/environment-variables.rst

index 5ed6d03587e64a3ba315065da3e2e5e1e1296b53..1698fcedb260d307b15f8c9c182f17e920165895 100644 (file)
--- a/docs/user-guide/environment-variables.rst
+++ b/docs/user-guide/environment-variables.rst
@@ -1,3 +1,9 @@
+.. NOTE: Below is a useful bash one-liner to verify whether there are variables in this file
+..        no longer present in the code.
+.. ( export INPUT_FILE='docs/user-guide/environment-variables.rst' GIT_PAGER="cat "; for s in $(grep '^`'  $INPUT_FILE | sed 's/`//g' | sed 's/,/ /g'); do count=$(git grep $s | grep -v $INPUT_FILE | wc -l); [ $count -eq 0 ] && printf "%-30s%s\n" $s $count; done ; )
+.. Another useful one-liner to find undocumentedvariables:
+..  ( export INPUT_FILE=docs/user-guide/environment-variables.rst; GIT_PAGER="cat ";   for ss in `for s in $(git grep getenv |  sed 's/.*getenv("\(.*\)".*/\1/' | sort -u  | grep '^[A-Z]'); do [ $(grep $s $INPUT_FILE -c) -eq 0 ] && echo $s; done `; do git grep $ss ; done )
+
  Environment Variables
  =====================
  
@@ -16,8 +22,8 @@ you should consult your local documentation for details.
  
  Output Control
  --------------
-``GMX_CONSTRAINTVIR``
-        Print constraint virial and force virial energy terms.
+``GMX_DUMP_NL``
+        Neighbour list dump level; default 0.
  
  ``GMX_MAXBACKUP``
          |Gromacs| automatically backs up old
@@ -52,9 +58,6 @@ Output Control
          Be careful not to use a command which blocks the terminal
          (e.g. ``vi``), since multiple instances might be run.
  
-``GMX_VIRIAL_TEMPERATURE``
-        print virial temperature energy term
-
  ``GMX_LOG_BUFFER``
          the size of the buffer for file I/O. When set
          to 0, all file I/O will be unbuffered and therefore very slow.
@@ -119,6 +122,12 @@ Debugging
          arrive first. Setting this variable switches to the generic path with fixed waiting
          order.
  
+``GMX_TEST_REQUIRED_NUMBER_OF_DEVICES``
+        sets the number of GPUs required by the test suite. By default, the test suite would
+        fall-back to using CPU if GPUs could not be detected. Set it to a positive integer value
+        to ensure that at least this at least this number of usable GPUs are detected. Default:
+        0 (not testing GPU availability).
+
  There are a number of extra environment variables like these
  that are used in debugging - check the code!
  
@@ -130,33 +139,37 @@ Performance and Run Control
          file. Normally, :mdp:`epsilon-r` must be greater than zero to prevent a fatal error.
          See webpage_ for example input files for a planetary simulation.
  
-``GMX_ALLOW_CPT_MISMATCH``
-        when set, runs will not exit if the
-        ensemble set in the :ref:`tpr` file does not match that of the
-        :ref:`cpt` file.
+``GMX_BONDED_NTHREAD_UNIFORM``
+        Value of the number of threads per rank from which to switch from uniform
+        to localized bonded interaction distribution; optimal value dependent on
+        system and hardware, default value is 4.
  
-``GMX_CUDA_NB_EWALD_TWINCUT``
+``GMX_GPU_NB_EWALD_TWINCUT``
          force the use of twin-range cutoff kernel even if :mdp:`rvdw` equals
          :mdp:`rcoulomb` after PP-PME load balancing. The switch to twin-range kernels is automated,
          so this variable should be used only for benchmarking.
  
-``GMX_CUDA_NB_ANA_EWALD``
+``GMX_GPU_NB_ANA_EWALD``
          force the use of analytical Ewald kernels. Should be used only for benchmarking.
  
-``GMX_CUDA_NB_TAB_EWALD``
+``GMX_GPU_NB_TAB_EWALD``
          force the use of tabulated Ewald kernels. Should be used only for benchmarking.
  
-``GMX_CUDA_STREAMSYNC``
-        force the use of cudaStreamSynchronize on ECC-enabled GPUs, which leads
-        to performance loss due to a known CUDA driver bug present in API v5.0 NVIDIA drivers (pre-30x.xx).
-        Cannot be set simultaneously with ``GMX_NO_CUDA_STREAMSYNC``.
+``GMX_DISABLE_CUDA_TIMING``
+        Deprecated. Use ``GMX_DISABLE_GPU_TIMING`` instead.
  
-``GMX_DISABLE_CUDALAUNCH``
-        disable the use of the lower-latency cudaLaunchKernel API even when supported (CUDA >=v7.0).
-        Should only be used for benchmarking purposes.
+``GMX_GPU_DD_COMMS``
+        perform domain decomposition halo exchange communication operations (on coordinate and force buffers)
+        directly on GPU memory spaces, without the staging of data through CPU memory, where possible.
  
-``GMX_DISABLE_CUDA_TIMING``
-        Disables GPU timing of CUDA tasks; synonymous with ``GMX_DISABLE_GPU_TIMING``.
+``GMX_GPU_PME_PP_COMMS``
+        when the simulation uses a separate PME rank, perform communication operations between PP and PME rank
+        (for coordinate and force buffers) directly on GPU memory spaces, without the staging of data through CPU
+        memory, where possible. 
+
+``GMX_GPU_SYCL_NO_SYNCHRONIZE``
+        disable synchronizations between different GPU streams in SYCL build, instead relying on SYCL runtime to
+        do scheduling based on data dependencies. Experimental.
  
  ``GMX_CYCLE_ALL``
          times all code during runs.  Incompatible with threads.
@@ -222,6 +235,14 @@ Performance and Run Control
  ``GMX_FORCE_UPDATE``
          update forces when invoking ``mdrun -rerun``.
  
+``GMX_FORCE_UPDATE_DEFAULT_GPU``
+        Force update to run on the GPU by default, overriding the ``mdrun -update auto`` option. Works similar to setting
+        ``mdrun -update gpu``, but (1) falls back to the CPU code-path, if set with input that is not supported and
+        (2) can be used to run update on GPUs in multi-rank cases. The latter case should be
+        considered experimental since it lacks substantial testing. Also, GPU update is only supported with the GPU direct
+        communications and ``GMX_FORCE_UPDATE_DEFAULT_GPU`` variable should be set simultaneously with ``GMX_GPU_DD_COMMS``
+        and ``GMX_GPU_PME_PP_COMMS`` environment variables in multi-rank case. Does not override ``mdrun -update cpu``.
+
  ``GMX_GPU_ID``
          set in the same way as ``mdrun -gpu_id``, ``GMX_GPU_ID``
          allows the user to specify different GPU IDs for different ranks, which can be useful for selecting different
@@ -233,6 +254,10 @@ Performance and Run Control
          runtime permits this variable to be different for different ranks. Cannot be used
          in conjunction with ``mdrun -gputasks``. Has all the same requirements as ``mdrun -gputasks``.
  
+``GMX_GPU_DISABLE_COMPATIBILITY_CHECK``
+        Disables the hardware compatibility check in OpenCL and SYCL. Useful for developers
+        and allows testing the OpenCL/SYCL kernels on non-supported platforms without source code modification.
+
  ``GMX_IGNORE_FSYNC_FAILURE_ENV``
          allow :ref:`gmx mdrun` to continue even if
          a file is missing.
@@ -245,17 +270,12 @@ Performance and Run Control
          if set to -1, :ref:`gmx mdrun` will
          not exit if it produces too many LINCS warnings.
  
-``GMX_NB_GENERIC``
-        use the generic C kernel.  Should be set if using
-        the group-based cutoff scheme and also sets ``GMX_NO_SOLV_OPT`` to be true,
-        thus disabling solvent optimizations as well.
-
  ``GMX_NB_MIN_CI``
          neighbor list balancing parameter used when running on GPU. Sets the
          target minimum number pair-lists in order to improve multi-processor load-balance for better
          performance with small simulation systems. Must be set to a non-negative integer,
          the 0 value disables list splitting.
-        The default value is optimized for supported GPUs (NVIDIA Fermi to Maxwell),
+        The default value is optimized for supported GPUs
          therefore changing it is not necessary for normal usage, but it can be useful on future architectures.
  
  ``GMX_NBLISTCG``
@@ -280,8 +300,8 @@ Performance and Run Control
          force the use of 4xN SIMD CPU non-bonded kernels,
          mutually exclusive of ``GMX_NBNXN_SIMD_2XNN``.
  
-``GMX_NO_ALLVSALL``
-        disables optimized all-vs-all kernels.
+``GMX_NOOPTIMIZEDKERNELS``
+        deprecated, use ``GMX_DISABLE_SIMD_KERNELS`` instead.
  
  ``GMX_NO_CART_REORDER``
          used in initializing domain decomposition communicators. Rank reordering
@@ -291,11 +311,6 @@ Performance and Run Control
          force the use of LJ paremeter lookup instead of using combination rules
          in the non-bonded kernels.
  
-``GMX_NO_CUDA_STREAMSYNC``
-        the opposite of ``GMX_CUDA_STREAMSYNC``. Disables the use of the
-        standard cudaStreamSynchronize-based GPU waiting to improve performance when using CUDA driver API
-        ealier than v5.0 with ECC-enabled GPUs.
-
  ``GMX_NO_INT``, ``GMX_NO_TERM``, ``GMX_NO_USR1``
          disable signal handlers for SIGINT,
          SIGTERM, and SIGUSR1, respectively.
@@ -309,24 +324,25 @@ Performance and Run Control
          fast enough to complete the non-bonded calculations while the CPU does bonded force and PME computation.
          Freezing the particles will be required to stop the system blowing up.
  
-``GMX_NO_PULLVIR``
-        when set, do not add virial contribution to COM pull forces.
+``GMX_PULL_PARTICIPATE_ALL``
+        disable the default heuristic for when to use a separate pull MPI communicator (at >=32 ranks).
  
  ``GMX_NOPREDICT``
          shell positions are not predicted.
  
-``GMX_NO_SOLV_OPT``
-        turns off solvent optimizations; automatic if ``GMX_NB_GENERIC``
-        is enabled.
+``GMX_NO_UPDATEGROUPS``
+        turns off update groups. May allow for a decomposition of more
+        domains for small systems at the cost of communication during update.
  
  ``GMX_NSCELL_NCG``
          the ideal number of charge groups per neighbor searching grid cell is hard-coded
          to a value of 10. Setting this environment variable to any other integer value overrides this hard-coded
          value.
  
-``GMX_PME_NTHREADS``
-        set the number of OpenMP or PME threads (overrides the number guessed by
-        :ref:`gmx mdrun`.
+``GMX_PME_NUM_THREADS``
+        set the number of OpenMP or PME threads; overrides the default set by
+        :ref:`gmx mdrun`; can be used instead of the ``-npme`` command line option,
+        also useful to set heterogeneous per-process/-node thread count.
  
  ``GMX_PME_P3M``
          use P3M-optimized influence function instead of smooth PME B-spline interpolation.
@@ -334,7 +350,7 @@ Performance and Run Control
  ``GMX_PME_THREAD_DIVISION``
          PME thread division in the format "x y z" for all three dimensions. The
          sum of the threads in each dimension must equal the total number of PME threads (set in
-        `GMX_PME_NTHREADS`).
+        :envvar:`GMX_PME_NTHREADS`).
  
  ``GMX_PMEONEDD``
          if the number of domain decomposition cells is set to 1 for both x and y,
@@ -347,11 +363,6 @@ Performance and Run Control
          require the use of tabulated Coulombic
          and van der Waals interactions.
  
-``GMX_SCSIGMA_MIN``
-        the minimum value for soft-core sigma. **Note** that this value is set
-        using the :mdp:`sc-sigma` keyword in the :ref:`mdp` file, but this environment variable can be used
-        to reproduce pre-4.5 behavior with respect to this parameter.
-
  ``GMX_TPIC_MASSES``
          should contain multiple masses used for test particle insertion into a cavity.
          The center of mass of the last atoms is used for insertion into the cavity.
@@ -366,7 +377,7 @@ Performance and Run Control
  ``HWLOC_XMLFILE``
          Not strictly a |Gromacs| environment variable, but on large machines
          the hwloc detection can take a few seconds if you have lots of MPI processes.
-        If you run the hwloc command `lstopo out.xml` and set this environment
+        If you run the hwloc command :command:`lstopo out.xml` and set this environment
          variable to point to the location of this file, the hwloc library will use
          the cached information instead, which can be faster.
  
@@ -386,10 +397,6 @@ Performance and Run Control
          by mdrun. Values should be between the pruning frequency value
          (1 for CPU and 2 for GPU) and :mdp:`nstlist` ``- 1``.
  
-``GMX_USE_TREEREDUCE``
-        use tree reduction for nbnxn force reduction. Potentially faster for large number of
-        OpenMP threads (if memory locality is important).
-
  .. _opencl-management:
  
  OpenCL management
@@ -416,6 +423,7 @@ compilation of OpenCL kernels, but they are also used in device selection.
  
  ``GMX_OCL_DISABLE_FASTMATH``
          Prevents the use of ``-cl-fast-relaxed-math`` compiler option.
+        Not: fast math is always disabled on Intel devices due to instability.
  
  ``GMX_OCL_DUMP_LOG``
          If defined, the OpenCL build log is always written to the
@@ -435,8 +443,8 @@ compilation of OpenCL kernels, but they are also used in device selection.
          ``GMX_OCL_NOGENCACHE``).
  
              - NVIDIA GPUs: PTX code is saved in the current directory
-             with the name ``device_name.ptx``
-           - AMD GPUs: ``.IL/.ISA`` files will be created for each OpenCL
+              with the name ``device_name.ptx``
+            - AMD GPUs: ``.IL/.ISA`` files will be created for each OpenCL
                kernel built.  For details about where these files are
                created check AMD documentation for ``-save-temps`` compiler
                option.
@@ -456,62 +464,30 @@ compilation of OpenCL kernels, but they are also used in device selection.
          simplicity of stepping in a kernel and see what is happening.
  
  ``GMX_OCL_DISABLE_I_PREFETCH``
-        Disables i-atom data (type or LJ parameter) prefetch allowig
+        Disables i-atom data (type or LJ parameter) prefetch allowing
          testing.
  
  ``GMX_OCL_ENABLE_I_PREFETCH``
-        Enables i-atom data (type or LJ parameter) prefetch allowig
+        Enables i-atom data (type or LJ parameter) prefetch allowing
          testing on platforms where this behavior is not default.
  
-``GMX_OCL_NB_ANA_EWALD``
-        Forces the use of analytical Ewald kernels. Equivalent of
-        CUDA environment variable ``GMX_CUDA_NB_ANA_EWALD``
-
-``GMX_OCL_NB_TAB_EWALD``
-        Forces the use of tabulated Ewald kernel. Equivalent
-        of CUDA environment variable ``GMX_OCL_NB_TAB_EWALD``
-
-``GMX_OCL_NB_EWALD_TWINCUT``
-        Forces the use of twin-range cutoff kernel. Equivalent of
-        CUDA environment variable ``GMX_CUDA_NB_EWALD_TWINCUT``
-
  ``GMX_OCL_FILE_PATH``
          Use this parameter to force |Gromacs| to load the OpenCL
          kernels from a custom location. Use it only if you want to
          override |Gromacs| default behavior, or if you want to test
          your own kernels.
  
-``GMX_OCL_DISABLE_COMPATIBILITY_CHECK``
-        Disables the hardware compatibility check. Useful for developers
-        and allows testing the OpenCL kernels on non-supported platforms
-        (like Intel iGPUs) without source code modification.
+``GMX_OCL_SHOW_DIAGNOSTICS``
+        Use Intel OpenCL extension to show additional runtime performance
+        diagnostics.
  
  Analysis and Core Functions
  ---------------------------
-``GMX_QM_ACCURACY``
-        accuracy in Gaussian L510 (MC-SCF) component program.
-
-``GMX_QM_ORCA_BASENAME``
-        prefix of :ref:`tpr` files, used in Orca calculations
-        for input and output file names.
-
-``GMX_QM_CPMCSCF``
-        when set to a nonzero value, Gaussian QM calculations will
-        iteratively solve the CP-MCSCF equations.
-
-``GMX_QM_MODIFIED_LINKS_DIR``
-        location of modified links in Gaussian.
  
  ``DSSP``
          used by :ref:`gmx do_dssp` to point to the ``dssp``
          executable (not just its path).
  
-``GMX_QM_GAUSS_DIR``
-        directory where Gaussian is installed.
-
-``GMX_QM_GAUSS_EXE``
-        name of the Gaussian executable.
-
  ``GMX_DIPOLE_SPACING``
          spacing used by :ref:`gmx dipoles`.
  
@@ -519,14 +495,11 @@ Analysis and Core Functions
          sets the maximum number of residues to be renumbered by
          :ref:`gmx grompp`. A value of -1 indicates all residues should be renumbered.
  
-``GMX_FFRTP_TER_RENAME``
+``GMX_NO_FFRTP_TER_RENAME``
          Some force fields (like AMBER) use specific names for N- and C-
          terminal residues (NXXX and CXXX) as :ref:`rtp` entries that are normally renamed. Setting
          this environment variable disables this renaming.
  
-``GMX_PATH_GZIP``
-        ``gunzip`` executable, used by :ref:`gmx wham`.
-
  ``GMX_FONT``
          name of X11 font used by :ref:`gmx view`.
  
@@ -534,9 +507,6 @@ Analysis and Core Functions
          the time unit used in output files, can be
          anything in fs, ps, ns, us, ms, s, m or h.
  
-``GMX_QM_GAUSSIAN_MEMORY``
-        memory used for Gaussian QM calculation.
-
  ``MULTIPROT``
          name of the ``multiprot`` executable, used by the
          contributed program ``do_multiprot``.
@@ -544,15 +514,6 @@ Analysis and Core Functions
  ``NCPUS``
          number of CPUs to be used for Gaussian QM calculation
  
-``GMX_ORCA_PATH``
-        directory where Orca is installed.
-
-``GMX_QM_SA_STEP``
-        simulated annealing step size for Gaussian QM calculation.
-
-``GMX_QM_GROUND_STATE``
-        defines state for Gaussian surface hopping calculation.
-
  ``GMX_TOTAL``
          name of the ``total`` executable used by the contributed
          ``do_shift`` program.