Disable fastmath with OpenCL on Intel devices

[alexxy/gromacs.git] / docs / user-guide / environment-variables.rst
diff --git a/docs/user-guide/environment-variables.rst b/docs/user-guide/environment-variables.rst

index 61eeb169e793c39f93c1bda8ed1007ddbad419e9..1698fcedb260d307b15f8c9c182f17e920165895 100644 (file)
--- a/docs/user-guide/environment-variables.rst
+++ b/docs/user-guide/environment-variables.rst
@@ -4,8 +4,6 @@
  .. Another useful one-liner to find undocumentedvariables:
  ..  ( export INPUT_FILE=docs/user-guide/environment-variables.rst; GIT_PAGER="cat ";   for ss in `for s in $(git grep getenv |  sed 's/.*getenv("\(.*\)".*/\1/' | sort -u  | grep '^[A-Z]'); do [ $(grep $s $INPUT_FILE -c) -eq 0 ] && echo $s; done `; do git grep $ss ; done )
  
-.. TODO: still undocumented GMX_QM_GAUSSIAN_NCPUS
-
  Environment Variables
  =====================
  
@@ -24,9 +22,6 @@ you should consult your local documentation for details.
  
  Output Control
  --------------
-``GMX_CONSTRAINTVIR``
-        Print constraint virial and force virial energy terms.
-
  ``GMX_DUMP_NL``
          Neighbour list dump level; default 0.
  
@@ -127,6 +122,12 @@ Debugging
          arrive first. Setting this variable switches to the generic path with fixed waiting
          order.
  
+``GMX_TEST_REQUIRED_NUMBER_OF_DEVICES``
+        sets the number of GPUs required by the test suite. By default, the test suite would
+        fall-back to using CPU if GPUs could not be detected. Set it to a positive integer value
+        to ensure that at least this at least this number of usable GPUs are detected. Default:
+        0 (not testing GPU availability).
+
  There are a number of extra environment variables like these
  that are used in debugging - check the code!
  
@@ -138,34 +139,38 @@ Performance and Run Control
          file. Normally, :mdp:`epsilon-r` must be greater than zero to prevent a fatal error.
          See webpage_ for example input files for a planetary simulation.
  
-``GMX_ALLOW_CPT_MISMATCH``
-        when set, runs will not exit if the
-        ensemble set in the :ref:`tpr` file does not match that of the
-        :ref:`cpt` file.
-
  ``GMX_BONDED_NTHREAD_UNIFORM``
          Value of the number of threads per rank from which to switch from uniform
          to localized bonded interaction distribution; optimal value dependent on
          system and hardware, default value is 4.
  
-``GMX_CUDA_NB_EWALD_TWINCUT``
+``GMX_GPU_NB_EWALD_TWINCUT``
          force the use of twin-range cutoff kernel even if :mdp:`rvdw` equals
          :mdp:`rcoulomb` after PP-PME load balancing. The switch to twin-range kernels is automated,
          so this variable should be used only for benchmarking.
  
-``GMX_CUDA_NB_ANA_EWALD``
+``GMX_GPU_NB_ANA_EWALD``
          force the use of analytical Ewald kernels. Should be used only for benchmarking.
  
-``GMX_CUDA_NB_TAB_EWALD``
+``GMX_GPU_NB_TAB_EWALD``
          force the use of tabulated Ewald kernels. Should be used only for benchmarking.
  
-``GMX_DISABLE_CUDALAUNCH``
-        disable the use of the lower-latency cudaLaunchKernel API even when supported (CUDA >=v7.0).
-        Should only be used for benchmarking purposes.
-
  ``GMX_DISABLE_CUDA_TIMING``
          Deprecated. Use ``GMX_DISABLE_GPU_TIMING`` instead.
  
+``GMX_GPU_DD_COMMS``
+        perform domain decomposition halo exchange communication operations (on coordinate and force buffers)
+        directly on GPU memory spaces, without the staging of data through CPU memory, where possible.
+
+``GMX_GPU_PME_PP_COMMS``
+        when the simulation uses a separate PME rank, perform communication operations between PP and PME rank
+        (for coordinate and force buffers) directly on GPU memory spaces, without the staging of data through CPU
+        memory, where possible. 
+
+``GMX_GPU_SYCL_NO_SYNCHRONIZE``
+        disable synchronizations between different GPU streams in SYCL build, instead relying on SYCL runtime to
+        do scheduling based on data dependencies. Experimental.
+
  ``GMX_CYCLE_ALL``
          times all code during runs.  Incompatible with threads.
  
@@ -230,6 +235,14 @@ Performance and Run Control
  ``GMX_FORCE_UPDATE``
          update forces when invoking ``mdrun -rerun``.
  
+``GMX_FORCE_UPDATE_DEFAULT_GPU``
+        Force update to run on the GPU by default, overriding the ``mdrun -update auto`` option. Works similar to setting
+        ``mdrun -update gpu``, but (1) falls back to the CPU code-path, if set with input that is not supported and
+        (2) can be used to run update on GPUs in multi-rank cases. The latter case should be
+        considered experimental since it lacks substantial testing. Also, GPU update is only supported with the GPU direct
+        communications and ``GMX_FORCE_UPDATE_DEFAULT_GPU`` variable should be set simultaneously with ``GMX_GPU_DD_COMMS``
+        and ``GMX_GPU_PME_PP_COMMS`` environment variables in multi-rank case. Does not override ``mdrun -update cpu``.
+
  ``GMX_GPU_ID``
          set in the same way as ``mdrun -gpu_id``, ``GMX_GPU_ID``
          allows the user to specify different GPU IDs for different ranks, which can be useful for selecting different
@@ -241,6 +254,10 @@ Performance and Run Control
          runtime permits this variable to be different for different ranks. Cannot be used
          in conjunction with ``mdrun -gputasks``. Has all the same requirements as ``mdrun -gputasks``.
  
+``GMX_GPU_DISABLE_COMPATIBILITY_CHECK``
+        Disables the hardware compatibility check in OpenCL and SYCL. Useful for developers
+        and allows testing the OpenCL/SYCL kernels on non-supported platforms without source code modification.
+
  ``GMX_IGNORE_FSYNC_FAILURE_ENV``
          allow :ref:`gmx mdrun` to continue even if
          a file is missing.
@@ -253,17 +270,12 @@ Performance and Run Control
          if set to -1, :ref:`gmx mdrun` will
          not exit if it produces too many LINCS warnings.
  
-``GMX_NB_GENERIC``
-        use the generic C kernel.  Should be set if using
-        the group-based cutoff scheme and also sets ``GMX_NO_SOLV_OPT`` to be true,
-        thus disabling solvent optimizations as well.
-
  ``GMX_NB_MIN_CI``
          neighbor list balancing parameter used when running on GPU. Sets the
          target minimum number pair-lists in order to improve multi-processor load-balance for better
          performance with small simulation systems. Must be set to a non-negative integer,
          the 0 value disables list splitting.
-        The default value is optimized for supported GPUs (NVIDIA Fermi to Maxwell),
+        The default value is optimized for supported GPUs
          therefore changing it is not necessary for normal usage, but it can be useful on future architectures.
  
  ``GMX_NBLISTCG``
@@ -291,9 +303,6 @@ Performance and Run Control
  ``GMX_NOOPTIMIZEDKERNELS``
          deprecated, use ``GMX_DISABLE_SIMD_KERNELS`` instead.
  
-``GMX_NO_ALLVSALL``
-        disables optimized all-vs-all kernels.
-
  ``GMX_NO_CART_REORDER``
          used in initializing domain decomposition communicators. Rank reordering
          is default, but can be switched off with this environment variable.
@@ -321,9 +330,9 @@ Performance and Run Control
  ``GMX_NOPREDICT``
          shell positions are not predicted.
  
-``GMX_NO_SOLV_OPT``
-        turns off solvent optimizations; automatic if ``GMX_NB_GENERIC``
-        is enabled.
+``GMX_NO_UPDATEGROUPS``
+        turns off update groups. May allow for a decomposition of more
+        domains for small systems at the cost of communication during update.
  
  ``GMX_NSCELL_NCG``
          the ideal number of charge groups per neighbor searching grid cell is hard-coded
@@ -332,7 +341,7 @@ Performance and Run Control
  
  ``GMX_PME_NUM_THREADS``
          set the number of OpenMP or PME threads; overrides the default set by
-        :ref:`gmx mdrun`; can be used instead of the `-npme` command line option,
+        :ref:`gmx mdrun`; can be used instead of the ``-npme`` command line option,
          also useful to set heterogeneous per-process/-node thread count.
  
  ``GMX_PME_P3M``
@@ -341,7 +350,7 @@ Performance and Run Control
  ``GMX_PME_THREAD_DIVISION``
          PME thread division in the format "x y z" for all three dimensions. The
          sum of the threads in each dimension must equal the total number of PME threads (set in
-        `GMX_PME_NTHREADS`).
+        :envvar:`GMX_PME_NTHREADS`).
  
  ``GMX_PMEONEDD``
          if the number of domain decomposition cells is set to 1 for both x and y,
@@ -354,11 +363,6 @@ Performance and Run Control
          require the use of tabulated Coulombic
          and van der Waals interactions.
  
-``GMX_SCSIGMA_MIN``
-        the minimum value for soft-core sigma. **Note** that this value is set
-        using the :mdp:`sc-sigma` keyword in the :ref:`mdp` file, but this environment variable can be used
-        to reproduce pre-4.5 behavior with respect to this parameter.
-
  ``GMX_TPIC_MASSES``
          should contain multiple masses used for test particle insertion into a cavity.
          The center of mass of the last atoms is used for insertion into the cavity.
@@ -373,7 +377,7 @@ Performance and Run Control
  ``HWLOC_XMLFILE``
          Not strictly a |Gromacs| environment variable, but on large machines
          the hwloc detection can take a few seconds if you have lots of MPI processes.
-        If you run the hwloc command `lstopo out.xml` and set this environment
+        If you run the hwloc command :command:`lstopo out.xml` and set this environment
          variable to point to the location of this file, the hwloc library will use
          the cached information instead, which can be faster.
  
@@ -393,10 +397,6 @@ Performance and Run Control
          by mdrun. Values should be between the pruning frequency value
          (1 for CPU and 2 for GPU) and :mdp:`nstlist` ``- 1``.
  
-``GMX_USE_TREEREDUCE``
-        use tree reduction for nbnxn force reduction. Potentially faster for large number of
-        OpenMP threads (if memory locality is important).
-
  .. _opencl-management:
  
  OpenCL management
@@ -423,6 +423,7 @@ compilation of OpenCL kernels, but they are also used in device selection.
  
  ``GMX_OCL_DISABLE_FASTMATH``
          Prevents the use of ``-cl-fast-relaxed-math`` compiler option.
+        Not: fast math is always disabled on Intel devices due to instability.
  
  ``GMX_OCL_DUMP_LOG``
          If defined, the OpenCL build log is always written to the
@@ -442,8 +443,8 @@ compilation of OpenCL kernels, but they are also used in device selection.
          ``GMX_OCL_NOGENCACHE``).
  
              - NVIDIA GPUs: PTX code is saved in the current directory
-             with the name ``device_name.ptx``
-           - AMD GPUs: ``.IL/.ISA`` files will be created for each OpenCL
+              with the name ``device_name.ptx``
+            - AMD GPUs: ``.IL/.ISA`` files will be created for each OpenCL
                kernel built.  For details about where these files are
                created check AMD documentation for ``-save-temps`` compiler
                option.
@@ -470,55 +471,23 @@ compilation of OpenCL kernels, but they are also used in device selection.
          Enables i-atom data (type or LJ parameter) prefetch allowing
          testing on platforms where this behavior is not default.
  
-``GMX_OCL_NB_ANA_EWALD``
-        Forces the use of analytical Ewald kernels. Equivalent of
-        CUDA environment variable ``GMX_CUDA_NB_ANA_EWALD``
-
-``GMX_OCL_NB_TAB_EWALD``
-        Forces the use of tabulated Ewald kernel. Equivalent
-        of CUDA environment variable ``GMX_OCL_NB_TAB_EWALD``
-
-``GMX_OCL_NB_EWALD_TWINCUT``
-        Forces the use of twin-range cutoff kernel. Equivalent of
-        CUDA environment variable ``GMX_CUDA_NB_EWALD_TWINCUT``
-
  ``GMX_OCL_FILE_PATH``
          Use this parameter to force |Gromacs| to load the OpenCL
          kernels from a custom location. Use it only if you want to
          override |Gromacs| default behavior, or if you want to test
          your own kernels.
  
-``GMX_OCL_DISABLE_COMPATIBILITY_CHECK``
-        Disables the hardware compatibility check. Useful for developers
-        and allows testing the OpenCL kernels on non-supported platforms
-        (like Intel iGPUs) without source code modification.
+``GMX_OCL_SHOW_DIAGNOSTICS``
+        Use Intel OpenCL extension to show additional runtime performance
+        diagnostics.
  
  Analysis and Core Functions
  ---------------------------
-``GMX_QM_ACCURACY``
-        accuracy in Gaussian L510 (MC-SCF) component program.
-
-``GMX_QM_ORCA_BASENAME``
-        prefix of :ref:`tpr` files, used in Orca calculations
-        for input and output file names.
-
-``GMX_QM_CPMCSCF``
-        when set to a nonzero value, Gaussian QM calculations will
-        iteratively solve the CP-MCSCF equations.
-
-``GMX_QM_MODIFIED_LINKS_DIR``
-        location of modified links in Gaussian.
  
  ``DSSP``
          used by :ref:`gmx do_dssp` to point to the ``dssp``
          executable (not just its path).
  
-``GMX_QM_GAUSS_DIR``
-        directory where Gaussian is installed.
-
-``GMX_QM_GAUSS_EXE``
-        name of the Gaussian executable.
-
  ``GMX_DIPOLE_SPACING``
          spacing used by :ref:`gmx dipoles`.
  
@@ -531,9 +500,6 @@ Analysis and Core Functions
          terminal residues (NXXX and CXXX) as :ref:`rtp` entries that are normally renamed. Setting
          this environment variable disables this renaming.
  
-``GMX_PATH_GZIP``
-        ``gunzip`` executable, used by :ref:`gmx wham`.
-
  ``GMX_FONT``
          name of X11 font used by :ref:`gmx view`.
  
@@ -541,9 +507,6 @@ Analysis and Core Functions
          the time unit used in output files, can be
          anything in fs, ps, ns, us, ms, s, m or h.
  
-``GMX_QM_GAUSSIAN_MEMORY``
-        memory used for Gaussian QM calculation.
-
  ``MULTIPROT``
          name of the ``multiprot`` executable, used by the
          contributed program ``do_multiprot``.
@@ -551,15 +514,6 @@ Analysis and Core Functions
  ``NCPUS``
          number of CPUs to be used for Gaussian QM calculation
  
-``GMX_ORCA_PATH``
-        directory where Orca is installed.
-
-``GMX_QM_SA_STEP``
-        simulated annealing step size for Gaussian QM calculation.
-
-``GMX_QM_GROUND_STATE``
-        defines state for Gaussian surface hopping calculation.
-
  ``GMX_TOTAL``
          name of the ``total`` executable used by the contributed
          ``do_shift`` program.