docs/release-notes/2021/major/performance.rst

   1 Performance improvements
   2 ^^^^^^^^^^^^^^^^^^^^^^^^
   3
   4 .. Note to developers!
   5    Please use """"""" to underline the individual entries for fixed issues in the subfolders,
   6    otherwise the formatting on the webpage is messed up.
   7    Also, please use the syntax :issue:`number` to reference issues on GitLab, without the
   8    a space between the colon and number!
   9
  10 Added support for multiple time-stepping
  11 """"""""""""""""""""""""""""""""""""""""
  12
  13 A two-level multiple time-stepping scheme has been implemented.
  14 Any combination of five different force groups can be selected
  15 to evaluate less frequently, thereby improving performance.
  16
  17 Extend supported use-cases for GPU version of update and constraints
  18 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
  19
  20 GPU version of update and constraints can now be used for FEP, except mass and constraints
  21 free-energy perturbation.
  22
  23 Reduce time spent in grompp with large numbers of distance restraints
  24 """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
  25
  26 The time `gmx grompp` spent processing distance restraint has been
  27 changed from quadratic in the number of restraints to linear.
  28
  29 :issue:`3457`
  30
  31 Support for offloading PME to GPU when doing Coulomb FEP
  32 """"""""""""""""""""""""""""""""""""""""""""""""""""""""
  33
  34 PME calculations can be offloaded to GPU when doing Coulomb free-energy perturbations.
  35
  36 CPU SIMD accelerated implementation of harmonic bonds
  37 """""""""""""""""""""""""""""""""""""""""""""""""""""
  38
  39 SIMD acceleration for bonds slightly improves performance for systems
  40 with H-bonds only constrained or no constraints. This gives a significant
  41 improvement with multiple time stepping.
  42
  43 Allow offloading GPU update and constraints without direct GPU communication
  44 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
  45
  46 Allow domain-decomposition and separate PME rank parallel runs to offload update and
  47 constraints to a GPU with CUDA without requiring the (experimental) direct GPU
  48 communication features to be also enabled.
  49
  50 Tune CUDA short-range nonbonded kernel parameters on NVIDIA Volta and Ampere A100
  51 """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
  52
  53 Recent compilers allowed re-tuning the nonbonded kernel defaults on NVIDIA Volta and
  54 Ampere A100GPUs which improves performance of the Ewald kernels, especially those that
  55 also compute energies.