Unify handling of GMX_ENABLE_GPU_TIMING and GMX_DISABLE_GPU_TIMING

[alexxy/gromacs.git] / docs / user-guide / environment-variables.rst
diff --git a/docs/user-guide/environment-variables.rst b/docs/user-guide/environment-variables.rst

index 4a19a2e208fad79277a53a92abbe59c0f9330e93..1f323174e1124da09523dbf792d20286fd844d23 100644 (file)
--- a/docs/user-guide/environment-variables.rst
+++ b/docs/user-guide/environment-variables.rst
@@ -22,9 +22,6 @@ you should consult your local documentation for details.
  
  Output Control
  --------------
-``GMX_CONSTRAINTVIR``
-        Print constraint virial and force virial energy terms.
-
  ``GMX_DUMP_NL``
          Neighbour list dump level; default 0.
  
@@ -85,8 +82,8 @@ Output Control
          files. Set to 0 for quiet operation.
  
  ``GMX_ENABLE_GPU_TIMING``
-        Enables GPU timings in the log file for CUDA. Note that CUDA timings
-        are incorrect with multiple streams, as happens with domain
+        Enables GPU timings in the log file for CUDA and SYCL. Note that CUDA
+        timings are incorrect with multiple streams, as happens with domain
          decomposition or with both non-bondeds and PME on the GPU (this is
          also the main reason why they are not turned on by default).
  
@@ -125,6 +122,12 @@ Debugging
          arrive first. Setting this variable switches to the generic path with fixed waiting
          order.
  
+``GMX_TEST_REQUIRED_NUMBER_OF_DEVICES``
+        sets the number of GPUs required by the test suite. By default, the test suite would
+        fall-back to using CPU if GPUs could not be detected. Set it to a positive integer value
+        to ensure that at least this at least this number of usable GPUs are detected. Default:
+        0 (not testing GPU availability).
+
  There are a number of extra environment variables like these
  that are used in debugging - check the code!
  
@@ -141,15 +144,15 @@ Performance and Run Control
          to localized bonded interaction distribution; optimal value dependent on
          system and hardware, default value is 4.
  
-``GMX_CUDA_NB_EWALD_TWINCUT``
+``GMX_GPU_NB_EWALD_TWINCUT``
          force the use of twin-range cutoff kernel even if :mdp:`rvdw` equals
          :mdp:`rcoulomb` after PP-PME load balancing. The switch to twin-range kernels is automated,
          so this variable should be used only for benchmarking.
  
-``GMX_CUDA_NB_ANA_EWALD``
+``GMX_GPU_NB_ANA_EWALD``
          force the use of analytical Ewald kernels. Should be used only for benchmarking.
  
-``GMX_CUDA_NB_TAB_EWALD``
+``GMX_GPU_NB_TAB_EWALD``
          force the use of tabulated Ewald kernels. Should be used only for benchmarking.
  
  ``GMX_DISABLE_CUDA_TIMING``
@@ -164,6 +167,10 @@ Performance and Run Control
          (for coordinate and force buffers) directly on GPU memory spaces, without the staging of data through CPU
          memory, where possible. 
  
+``GMX_GPU_SYCL_NO_SYNCHRONIZE``
+        disable synchronizations between different GPU streams in SYCL build, instead relying on SYCL runtime to
+        do scheduling based on data dependencies. Experimental.
+
  ``GMX_CYCLE_ALL``
          times all code during runs.  Incompatible with threads.
  
@@ -204,6 +211,7 @@ Performance and Run Control
  ``GMX_DISABLE_GPU_TIMING``
          timing of asynchronously executed GPU operations can have a
          non-negligible overhead with short step times. Disabling timing can improve performance in these cases.
+        Timings are disabled by default with CUDA and SYCL.
  
  ``GMX_DISABLE_GPU_DETECTION``
          when set, disables GPU detection even if :ref:`gmx mdrun` was compiled
@@ -247,6 +255,10 @@ Performance and Run Control
          runtime permits this variable to be different for different ranks. Cannot be used
          in conjunction with ``mdrun -gputasks``. Has all the same requirements as ``mdrun -gputasks``.
  
+``GMX_GPU_DISABLE_COMPATIBILITY_CHECK``
+        Disables the hardware compatibility check in OpenCL and SYCL. Useful for developers
+        and allows testing the OpenCL/SYCL kernels on non-supported platforms without source code modification.
+
  ``GMX_IGNORE_FSYNC_FAILURE_ENV``
          allow :ref:`gmx mdrun` to continue even if
          a file is missing.
@@ -352,11 +364,6 @@ Performance and Run Control
          require the use of tabulated Coulombic
          and van der Waals interactions.
  
-``GMX_SCSIGMA_MIN``
-        the minimum value for soft-core sigma. **Note** that this value is set
-        using the :mdp:`sc-sigma` keyword in the :ref:`mdp` file, but this environment variable can be used
-        to reproduce pre-4.5 behavior with respect to this parameter.
-
  ``GMX_TPIC_MASSES``
          should contain multiple masses used for test particle insertion into a cavity.
          The center of mass of the last atoms is used for insertion into the cavity.
@@ -391,10 +398,6 @@ Performance and Run Control
          by mdrun. Values should be between the pruning frequency value
          (1 for CPU and 2 for GPU) and :mdp:`nstlist` ``- 1``.
  
-``GMX_USE_TREEREDUCE``
-        use tree reduction for nbnxn force reduction. Potentially faster for large number of
-        OpenMP threads (if memory locality is important).
-
  .. _opencl-management:
  
  OpenCL management
@@ -421,6 +424,7 @@ compilation of OpenCL kernels, but they are also used in device selection.
  
  ``GMX_OCL_DISABLE_FASTMATH``
          Prevents the use of ``-cl-fast-relaxed-math`` compiler option.
+        Not: fast math is always disabled on Intel devices due to instability.
  
  ``GMX_OCL_DUMP_LOG``
          If defined, the OpenCL build log is always written to the
@@ -440,8 +444,8 @@ compilation of OpenCL kernels, but they are also used in device selection.
          ``GMX_OCL_NOGENCACHE``).
  
              - NVIDIA GPUs: PTX code is saved in the current directory
-             with the name ``device_name.ptx``
-           - AMD GPUs: ``.IL/.ISA`` files will be created for each OpenCL
+              with the name ``device_name.ptx``
+            - AMD GPUs: ``.IL/.ISA`` files will be created for each OpenCL
                kernel built.  For details about where these files are
                created check AMD documentation for ``-save-temps`` compiler
                option.
@@ -468,29 +472,12 @@ compilation of OpenCL kernels, but they are also used in device selection.
          Enables i-atom data (type or LJ parameter) prefetch allowing
          testing on platforms where this behavior is not default.
  
-``GMX_OCL_NB_ANA_EWALD``
-        Forces the use of analytical Ewald kernels. Equivalent of
-        CUDA environment variable ``GMX_CUDA_NB_ANA_EWALD``
-
-``GMX_OCL_NB_TAB_EWALD``
-        Forces the use of tabulated Ewald kernel. Equivalent
-        of CUDA environment variable ``GMX_OCL_NB_TAB_EWALD``
-
-``GMX_OCL_NB_EWALD_TWINCUT``
-        Forces the use of twin-range cutoff kernel. Equivalent of
-        CUDA environment variable ``GMX_CUDA_NB_EWALD_TWINCUT``
-
  ``GMX_OCL_FILE_PATH``
          Use this parameter to force |Gromacs| to load the OpenCL
          kernels from a custom location. Use it only if you want to
          override |Gromacs| default behavior, or if you want to test
          your own kernels.
  
-``GMX_OCL_DISABLE_COMPATIBILITY_CHECK``
-        Disables the hardware compatibility check. Useful for developers
-        and allows testing the OpenCL kernels on non-supported platforms
-        (like Intel iGPUs) without source code modification.
-
  ``GMX_OCL_SHOW_DIAGNOSTICS``
          Use Intel OpenCL extension to show additional runtime performance
          diagnostics.
@@ -514,9 +501,6 @@ Analysis and Core Functions
          terminal residues (NXXX and CXXX) as :ref:`rtp` entries that are normally renamed. Setting
          this environment variable disables this renaming.
  
-``GMX_PATH_GZIP``
-        ``gunzip`` executable, used by :ref:`gmx wham`.
-
  ``GMX_FONT``
          name of X11 font used by :ref:`gmx view`.