Alexey Shvetsov [Mon, 26 Sep 2022 08:23:52 +0000 (11:23 +0300)]
Merge remote-tracking branch 'origin/main' into densgrid
Alexey Shvetsov [Mon, 26 Sep 2022 08:23:22 +0000 (11:23 +0300)]
Add headers
Signed-off-by: Alexey Shvetsov <alexxyum@gmail.com>
Alexey Shvetsov [Mon, 26 Sep 2022 08:22:27 +0000 (11:22 +0300)]
Fix formatting
Signed-off-by: Alexey Shvetsov <alexxyum@gmail.com>
Alexey Shvetsov [Mon, 26 Sep 2022 08:21:20 +0000 (11:21 +0300)]
Fix for formatting
Signed-off-by: Alexey Shvetsov <alexxyum@gmail.com>
Paul Bauer [Fri, 23 Sep 2022 15:02:51 +0000 (15:02 +0000)]
Use std::filesystem in more places
Replace more custom gmx file handling with std::filesystem based
approaches. Remove GMX_MAX_PATH_LENGTH variable as it is no longer
needed.
Szilárd Páll [Fri, 23 Sep 2022 13:19:04 +0000 (13:19 +0000)]
Optimize bonded force accumulation in SYCL kernels
On Intel KBL performance of the force kernel improves by 3x.
Performance improvements with hipSYCL are modest, likely due to
other performance limiters in the kernel.
Refs #3928
Andrey Alekseenko [Fri, 23 Sep 2022 12:25:31 +0000 (12:25 +0000)]
SYCL: Remove support for buffers
- They are not compatible with GPU-aware MPI
- All the optimization efforts so far led us away from (implicit) DAG-based
scheduling.
- They require non-zero maintenance and CI resources.
Closes #4603, #3895.
Closes #3895 and #4603
Andrey Alekseenko [Fri, 23 Sep 2022 10:48:20 +0000 (10:48 +0000)]
SYCL: Remove deprecated is_host
It is deprecated in SYCL2020, and IntelLLVM now complains about it.
We had this check to avoid querying device properties for the `host`
device, which was throwing exceptions.
Since
9eab63bacd (!3126), we can handle exceptions just fine.
Moreover, since
ecb223ff5 (!2653), we're only listing GPU devices,
so the host device would never be passed to this function anyway.
Andrey Alekseenko [Fri, 23 Sep 2022 07:59:41 +0000 (07:59 +0000)]
Avoid delayed hardware de-initialization in tests
In our tests, `gmx_hw_info_t` was only destroyed at the very end of the
program as part of destroying the static `TestHardwareEnvironment`. Since
the order of destruction of static orders is unspecified, it was
sometimes destroyed after hipSYCL shut down the rest of the CUDA/HIP runtime.
With CUDA, it led to harmless `cudaErrorCudartUnloading` errors printed to
stderr. With HIP, however, it was sometimes causing segfaults in
libamdhip64.so. Only in tests and only at the very end, but still not nice.
We now explicitly destroy `gmx_hw_info_t` after running all the tests.
M. Eric Irrgang [Thu, 22 Sep 2022 15:18:50 +0000 (18:18 +0300)]
Remove inappropriate `#include`
c40e470d neglected to remove an `#include` that
referenced `<mpi.h>`, which might not be available
in some build environments.
Refs #4608
Paul Bauer [Thu, 22 Sep 2022 16:18:11 +0000 (16:18 +0000)]
Rename reference of master rank to main
Used coordinator in some places as well where it made sense to me.
Refs #4459
Andrey Alekseenko [Thu, 22 Sep 2022 15:23:10 +0000 (15:23 +0000)]
Remove the deprecated ddSendrecv template
And some adjacent modernization, mostly replacing `rvec*` with
`gmx::ArrayRef<gmx::RVec>`.
M. Eric Irrgang [Thu, 22 Sep 2022 14:53:40 +0000 (14:53 +0000)]
Remove a duplicated export.
`export_mpi_bindings()` was supposed to be moved to a narrower scope so
that it only gets called when MPI bindings support is available, as
detected by CMake.
Refs #4608
Paul Bauer [Thu, 22 Sep 2022 08:00:05 +0000 (10:00 +0200)]
Add new cool quote
Not the be taken out of context.
Szilárd Páll [Mon, 19 Sep 2022 17:56:59 +0000 (19:56 +0200)]
Add missing __forceinline__ attribute in the bonded module
clang-CUDA was not inlining and had ~4x worse performance than nvcc.
Berk Hess [Tue, 20 Sep 2022 15:22:45 +0000 (17:22 +0200)]
Fix nbnxm 2xMM energy group accumulation
Broken by recent commit
0ad4a2ede3f.
Roland Schulz [Thu, 22 Sep 2022 11:16:06 +0000 (11:16 +0000)]
SYCL: Catch exception in isDeviceCompatible
Mark Abraham [Fri, 16 Sep 2022 14:50:16 +0000 (16:50 +0200)]
Remove getIOpenEntry
The pair list often does operations on the last entry in the list,
which is the current work-in-progress. This can be expressed more
directly in the code.
This will help simplify the code so that we can template it on the
number of j-cluster indices that are packed together.
Refs #3847
M. Eric Irrgang [Wed, 21 Sep 2022 13:36:59 +0000 (13:36 +0000)]
Flexible communicator management for tMPI or MPI.
Manage the hierarchy of communicators (or non-MPI resources) whether
or not mpi4py is available for ensembles of any size, both for
thread-MPI GROMACS and GROMACS with library MPI support.
Fixes #4423
Closes #4423
Anatoly Titov [Tue, 20 Sep 2022 15:48:20 +0000 (18:48 +0300)]
- added comments and small adjustments
Berk Hess [Tue, 20 Sep 2022 14:37:10 +0000 (14:37 +0000)]
Replace t_blocka in index group handling by std containers
This change is only refactoring, except for minor changes in
printing of index groups in gmx dump and debug output.
Berk Hess [Tue, 20 Sep 2022 13:56:03 +0000 (13:56 +0000)]
Fix non-bonded group pair energy reporting
Recent commit
0ad4a2ede3f caused all energies contributions
to be added to group pair 0,0.
Szilárd Páll [Tue, 20 Sep 2022 13:15:26 +0000 (13:15 +0000)]
Optimize bonded force accumulation in CUDA kernels
Consecutive threads accumulate different force components reducing the
potential for atomic clashes (by a theoretical max of 3x).
The measured performance improvement is 1.35-1.5x in the force, and
1.5-1.6x in the force+energy kernel (on cc 7.x and 8.5).
Original idea: A. Gorenko StreamHPC.
Paul Bauer [Tue, 20 Sep 2022 10:14:36 +0000 (10:14 +0000)]
Remove xlc compiler specific code
Also removes code that was needed for ancient versions of gcc in Power9
Simd instructions.
Paul Bauer [Tue, 20 Sep 2022 08:22:07 +0000 (08:22 +0000)]
Replace custom gmx PATH with std::filesystem
Remove all custom path handling for files and directories and replace
with std::filesystem based approach.
Reworked some useful methods to use std::filesystem under the hood but
keep the same call signatures.
M. Eric Irrgang [Tue, 20 Sep 2022 06:38:14 +0000 (06:38 +0000)]
Remove unused uncurated files from python_packaging.
* We no longer have a separate Dockerfile for isolated gmxapi docs builds.
* The Travis-CI config has not been tested in a while. We cannot stand behind it as a sample config for new forked projects.
M. Eric Irrgang [Mon, 19 Sep 2022 17:23:51 +0000 (17:23 +0000)]
Core bindings for MPI Comm sharing.
New creation function bindings for pycontext receive
and use an mpi4py communicator to initialize the
gmxapi::Context, where applicable.
Provides the C++ support needed for #4423,
and for the core functionality under #4422.
Ref #4422.
Mark Abraham [Fri, 16 Sep 2022 14:50:16 +0000 (16:50 +0200)]
Rename j4 to jPacked
Refs #3847
Mark Abraham [Mon, 19 Sep 2022 07:34:03 +0000 (07:34 +0000)]
Use classes for j-list handling
This prepares for future changes to the value of
c_nbnxnGpuNumClusterPerSupercluster and thus c_nbnxnGpuJGroupSize to
be more flexible in optimizing for a wider range of GPUs. This will be
easier to manage if the compile-time values come from templates rather
than being hard coded.
This commit begins to replace the concept of j4 clusters with packed j
clusters, since the number of elements packed per int may change in
future. The bulk of the renaming is deferred for a future change.
Refs #3847
M. Eric Irrgang [Fri, 16 Sep 2022 20:03:45 +0000 (20:03 +0000)]
Basic MPI client support.
* Help MPI-enabled client software build with compatible GROMACS.
* Establish some basic MPI Comm bindings for gmxapi.
* Test the ability to receive and use a communicator from mpi4py.
Ref #3718, #3777, #4422, #4447
Gaurav Garg [Fri, 16 Sep 2022 17:20:27 +0000 (17:20 +0000)]
Follow-up to GPU cycle counter change
- Rename cycle counter LaunchGpu to LaunchPpGpuOps as it contains only PP cycles
- Move LaunchGpuPmeFft to sub-counter table from PME breakdown table
Refs: #4490
Berk Hess [Fri, 16 Sep 2022 15:27:00 +0000 (15:27 +0000)]
Enable Shake in constr test
Shake is now enabled in addition to LICNS in the constr test.
To make Shake pass the velocity check, the shake tolerance had
to be tightened.
Fixes #4586
Change-Id: Id9573993981b024853767adcd62cad3639ccfbe5
Closes #4586
Paul Bauer [Fri, 16 Sep 2022 14:34:02 +0000 (14:34 +0000)]
Fix default checkout in post-merge trigger script
Changed HEAD to FETCH_HEAD, to make sure script runs also when not
providing any futher input.
Andrey Alekseenko [Fri, 16 Sep 2022 13:18:58 +0000 (13:18 +0000)]
Silence clang warning
In builds without wallcycle sub-counters, clang complains about one of
the helper functions being unused.
Introduced in
298eac7e3fa (MR !1767).
Failed pipeline: https://gitlab.com/gromacs/gromacs/-/pipelines/
641767524
Fixed pipeline: https://gitlab.com/gromacs/gromacs/-/pipelines/
642244320
The `arrayref` header was not needed, so I replaced it with
`basedefinitions`, which is needed now.
Roland Schulz [Mon, 12 Sep 2022 21:14:47 +0000 (14:14 -0700)]
SYCL: Simplify indexing
ejjordan [Thu, 15 Sep 2022 14:39:29 +0000 (16:39 +0200)]
Allow use of ccache with Intel oneAPI compilers
This works, its just that no one had added intelLLVM to the ccache
whitelist.
Roland Schulz [Wed, 14 Sep 2022 03:15:49 +0000 (20:15 -0700)]
SYCL: Remove tidxz
Szilárd Páll [Thu, 15 Sep 2022 13:12:05 +0000 (13:12 +0000)]
Rename computation to activities in wallcycle
We measure computation, communication, as well as latency overheads (e.g.
GPU API overheads), so "computation" as a name is not a good fit.
Andrey Alekseenko [Thu, 15 Sep 2022 12:41:14 +0000 (12:41 +0000)]
Add missing header
It is needed for the VS2022 build.
(Whether the VS2022 build itself is needed is another question)
Berk Hess [Thu, 15 Sep 2022 11:47:41 +0000 (11:47 +0000)]
Add time as a transformation pull coord variable
This enables arbitrary time dependence for pull coordinates.
Time is not allowed as a variable when AWH is active, as AWH
assumes the system is in equilibrium. This is checked.
Tests are added.
Implements #4470
Closes #4470
Gaurav Garg [Thu, 15 Sep 2022 08:40:57 +0000 (08:40 +0000)]
Fix double-counting of PME cycle counters
- GPU operations launched in PME rank were considered as part of PP cycles as both PME and PP shared the same cycle counter
- Use different counter for PME GPU ops and make it part of the PME counters
Refs: #4490
Berk Hess [Thu, 15 Sep 2022 08:06:07 +0000 (08:06 +0000)]
Improve constraint test checks
Generated the reference data in double precision with LINCS
with niter=2, order=8.
Change the coordinate and velocity tolerances to absolute.
Changed the virial tolerance to relative to the average
diagonal element.
Refs #4586
Paul Bauer [Wed, 14 Sep 2022 12:40:46 +0000 (14:40 +0200)]
Minor bump to CMake in Github CI
Changed 3.18.3 to minimum required 3.18.4
Joe Jordan [Wed, 14 Sep 2022 12:32:01 +0000 (12:32 +0000)]
Bump minimum cmake version to 3.18
Cmake 3.18.4 was released on 6 October 2020. The default version of cmake
in Ubuntu 22.04 is 3.22.1, so there should not be a problem there.
This is necessary because cmake 3.17 introduces find_package(CUDAToolkit),
which is a major step forward in usability over find_package(CUDA) or
enable_language(CUDA). Further, the first time that support for automatically
adding compiler flags for cuda version was introduced in cmake 3.18.
This will also make packaging cuda code much easier, as
without this change, it would be necessary to handle the different
variables that are set by the old find_package(CUDA) and the newer
find_package(CUDAToolkit). This work would also be somewhat pointless
busywork since surely we would anyway bump to at least cmake 3.18 after
the release.
The actual changes to handling of cuda cmake code are left to a followup.
Relates to #3288, #4428 (closed)
Szilárd Páll [Wed, 14 Sep 2022 11:37:26 +0000 (11:37 +0000)]
Reduce the number of atomics in UB force accumulation
Improves bonded performance by up to a few percent.
Refs #3928
M. Eric Irrgang [Thu, 8 Sep 2022 16:02:19 +0000 (19:02 +0300)]
Expand feature checking API for Python.
Includes updates to Exception and feature documentation.
Paul Bauer [Wed, 14 Sep 2022 09:05:34 +0000 (09:05 +0000)]
Silence clang tidy warning in wallcycle code
Seems like it couldn't understand what the value of the max debug
counters is if we make it dependent on debugging being active or not.
M. Eric Irrgang [Wed, 7 Sep 2022 08:39:37 +0000 (11:39 +0300)]
Python code checkers.
Add some packages that we may use for automated Python code checking.
Ref #4595
M. Eric Irrgang [Wed, 14 Sep 2022 07:25:47 +0000 (07:25 +0000)]
Isolate the CMake installation stages.
The CMake installations are highly reusable and parallelizable.
We move them to their own stages primarily to improve build layer cache.
This way, we don't have to worry as much about what gets built before
what.
M. Eric Irrgang [Tue, 13 Sep 2022 17:46:56 +0000 (17:46 +0000)]
New base exception for _gmxapi extension module.
* Separate exports for wrapped exceptions and exceptions
originating in the extension module.
* Provide a C++ and Python base exception to be used
throughout the gmxapi._gmxapi extension module.
* Isolate the handling for libgmxapi exceptions from
different API versions.
Follow-up:
Code in the C++ extension module (::gmxpy namespace)
should throw locally defined exceptions instead of
gmxapi::Exception subclasses.
Paul Bauer [Tue, 13 Sep 2022 15:46:15 +0000 (15:46 +0000)]
Replace preprocessor with constexpr in wallcycle
Moves the debug functions to checked through if constexpr instead of
hiding them behind preprocessor macros.
Andrey Alekseenko [Tue, 13 Sep 2022 14:14:37 +0000 (14:14 +0000)]
SYCL: Add GPU Bondeds kernel
Based on the CUDA implementation.
Tested on MI100 (gfx908) with hipSYCL and on V100 (sm_70) with hipSYCL and IntelLLVM.
Refs #3928
Anatoly Titov [Tue, 13 Sep 2022 13:04:07 +0000 (16:04 +0300)]
- separated classes to their own files
- successfully builds
Anatoly Titov [Tue, 13 Sep 2022 12:17:43 +0000 (15:17 +0300)]
- separation of classes to different files
Alexey Shvetsov [Tue, 13 Sep 2022 11:59:24 +0000 (14:59 +0300)]
Change name back
Signed-off-by: Alexey Shvetsov <alexxyum@gmail.com>
Alexey Shvetsov [Tue, 13 Sep 2022 11:52:31 +0000 (14:52 +0300)]
Merge branch 'main' into densgrid
Alexey Shvetsov [Tue, 13 Sep 2022 11:51:54 +0000 (14:51 +0300)]
More changes
Signed-off-by: Alexey Shvetsov <alexxyum@gmail.com>
Alexey Shvetsov [Tue, 13 Sep 2022 11:46:32 +0000 (14:46 +0300)]
Reformat and fix
Signed-off-by: Alexey Shvetsov <alexxyum@gmail.com>
Anatoly Titov [Tue, 13 Sep 2022 11:03:59 +0000 (14:03 +0300)]
- removed std::pair / switched to indexes + &frame
Anatoly Titov [Tue, 13 Sep 2022 10:50:03 +0000 (13:50 +0300)]
- removed tuples
Anatoly [Mon, 12 Sep 2022 15:16:59 +0000 (18:16 +0300)]
small changes to class impl.
M. Eric Irrgang [Mon, 12 Sep 2022 14:14:13 +0000 (14:14 +0000)]
Decouple runtime resource management from legacy context.py.
* Remove some leftover debugging cruft.
* Additional clean-up and notes.
M. Eric Irrgang [Mon, 12 Sep 2022 11:52:26 +0000 (11:52 +0000)]
Update libgmxapi so version.
Packagers and other things may break if we don't update the libgmxapi version between GROMACS distributions.
Mark Abraham [Mon, 12 Sep 2022 10:10:20 +0000 (10:10 +0000)]
Move nrjac.cpp
This file did not depend on anything in linearalgebra module, so
should not be with it.
Resolves circular depenency of math and linearalgebra.
Refs #3288
Andrey Alekseenko [Mon, 12 Sep 2022 07:32:43 +0000 (07:32 +0000)]
Add missing header
Failed MSVC build: https://github.com/gromacs/gromacs/runs/
8291215381
Fixed build: https://github.com/al42and/gromacs/runs/
8293256050
Mark Abraham [Sun, 11 Sep 2022 09:28:00 +0000 (09:28 +0000)]
Remove ISerializer overloads for rvec
Finishes breaking circular dependency of math and utility.
The serialization interface should live at a low level, but if it has
methods for rvec and ivec then it has to depend on math module. Then
there is no clear module at a level higher than math but lower than
everything else in which ISerializer should live.
Since it is not necessary to have convenience methods for these types,
they are removed. They were being used in the fileio module, which now
implements that functionality using free functions.
Refs #3288
Andrey Alekseenko [Sat, 10 Sep 2022 15:58:23 +0000 (15:58 +0000)]
Slightly modernize SYCL utils
- a workaround for a bug resolved almost a year ago,
- a device info handler deprecated in SYCL 2020, for which a modern alternative, `has`, has been available since late 2020,
- asserts in device code not hidden behind a macro.
Gaurav Garg [Fri, 9 Sep 2022 07:03:19 +0000 (07:03 +0000)]
Add release notes and installation details for GPU PME decomposition
Refs: #3884
Alexey Shvetsov [Fri, 9 Sep 2022 06:53:52 +0000 (09:53 +0300)]
One more emplace_back
Signed-off-by: Alexey Shvetsov <alexxyum@gmail.com>
Alexey Shvetsov [Fri, 9 Sep 2022 06:50:10 +0000 (09:50 +0300)]
Reformat code with right format
Signed-off-by: Alexey Shvetsov <alexxyum@gmail.com>
Alexey Shvetsov [Fri, 9 Sep 2022 06:48:47 +0000 (09:48 +0300)]
Update copyright
Signed-off-by: Alexey Shvetsov <alexxyum@gmail.com>
Alexey Shvetsov [Fri, 9 Sep 2022 06:43:40 +0000 (09:43 +0300)]
Merge branch 'main' into densgrid
Andrey Alekseenko [Thu, 8 Sep 2022 22:52:00 +0000 (22:52 +0000)]
SYCL: Improve atomicFetchAdd helper
- Support arbitrary address space (e.g., local)
- Support Float3 data type
Refs #3928
Andrey Alekseenko [Thu, 8 Sep 2022 21:27:17 +0000 (21:27 +0000)]
Remove GMX_GPU_SYCL_USE_SUBDEVICES
New behavior:
- hipSYCL: we never split devices.
- oneAPI: we always try to split devices into tiles or use them as-is
if they cannot be divided.
Closes #4567
Closes #4567
Andrey Alekseenko [Thu, 8 Sep 2022 20:48:57 +0000 (20:48 +0000)]
Remove remainings of source checksumming
Follow-up to
4d0b3d3b (!1920). Checksumming script was removed, now we
remove all references to it.
Note: if we want to bring checksumming back, CMake itself can calculate
hashes since 3.8, no need to depend on Python availability:
https://cmake.org/cmake/help/latest/command/file.html#hash
Refs #4133
Alexey Shvetsov [Thu, 8 Sep 2022 13:44:42 +0000 (16:44 +0300)]
Fix tidy
Signed-off-by: Alexey Shvetsov <alexxyum@gmail.com>
Anatoly Titov [Thu, 8 Sep 2022 13:29:38 +0000 (16:29 +0300)]
potentialy:
- private class pushed out
- function defenitions pushed out of the classes
- changed "int a {0}" to "int a = 0"
- removed unnessessary "gmx::"
- cleaned constr./destr./virtual
Mark Abraham [Thu, 8 Sep 2022 11:16:50 +0000 (13:16 +0200)]
Resolve circular dependency of mdtypes and utility
This include was not used for anything
Refs #3288
Anatoly Titov [Thu, 8 Sep 2022 11:42:06 +0000 (14:42 +0300)]
- added forgotten .h
Anatoly Titov [Thu, 8 Sep 2022 11:40:13 +0000 (14:40 +0300)]
- initial commit to the gromacs package
Paul Bauer [Thu, 8 Sep 2022 09:41:25 +0000 (11:41 +0200)]
Fix path to release python script
This wasn't done when the script itself was moved.
Mark Abraham [Thu, 8 Sep 2022 07:24:20 +0000 (09:24 +0200)]
Move MDModules notifiers to high-level module
This partly resolves the circular dependency of math on utility.
The MDModules notifier functionality should live at a higher level. In
particular, the notification types should be in a utility module low
enough that e.g. grompp, mdrun, and the modules receiving the
notifications can all comfortably depend on them, and high enough that
they can depend on types in other utility modules. For example, RVec
and ArrayRef should be able to be used in notifications, but that
means the notification type must live above both math and utility
modules.
Refs #3288
Mark Abraham [Wed, 7 Sep 2022 18:32:22 +0000 (20:32 +0200)]
Break circular dependency of simd and hardware
There's no need for the SIMD support-level code to live in the SIMD module, and
moving it to hardware means it lives alongside all its callers.
Minimized content of simd_support.h
Made simd module into an INTERFACE library, now that it can be.
Refs #3288
ejjordan [Mon, 5 Sep 2022 18:34:16 +0000 (20:34 +0200)]
Set cuda language property in utility
This change sets the cuda language property for the one file that
needs that in utility module. It leave the dependence between
libgromacs and utility intact, but may make it easier to sever this
dependence in the future.
Relates to #3288
Paul Bauer [Tue, 6 Sep 2022 07:39:55 +0000 (09:39 +0200)]
Make the pipeline trigger script more flexible
Allow selecting of regression tests to run against, and also pipeline
type.
Andrey Alekseenko [Wed, 7 Sep 2022 15:43:05 +0000 (15:43 +0000)]
Fix clang warnings in the release pipeline
Failed pipeline: https://gitlab.com/gromacs/gromacs/-/pipelines/
632945242
Fixed pipeline: https://gitlab.com/gromacs/gromacs/-/pipelines/
633561594
Andrey Alekseenko [Tue, 6 Sep 2022 16:28:04 +0000 (18:28 +0200)]
Fix a few dead links in docs
GoogleCode is shut down. GoogleMock is merged into GoogleTest.
M. Eric Irrgang [Wed, 7 Sep 2022 07:53:02 +0000 (07:53 +0000)]
Update minimum required GROMACS version for gmxapi Python package
Closes #4576
Closes #4576
Andrey Alekseenko [Wed, 7 Sep 2022 07:18:51 +0000 (07:18 +0000)]
Fix clang-tidy
Introduced in
826bfa821a2aa64d8ef4e9418fd2eb65b6e1079a
Failed post-merge jobs: https://gitlab.com/gromacs/gromacs/-/jobs/
2986724173, https://gitlab.com/gromacs/gromacs/-/jobs/
2986724174
New post-merge pipeline: https://gitlab.com/gromacs/gromacs/-/pipelines/
632612669
Paul Bauer [Tue, 6 Sep 2022 16:43:44 +0000 (16:43 +0000)]
Change upload paths for auto release script
Adjust script again for changes server side when it comes to the default
paths accessible for the upload accounts.
Berk Hess [Tue, 6 Sep 2022 15:34:40 +0000 (15:34 +0000)]
Move NBNxM diagonal masking to a class
Andrey Alekseenko [Tue, 6 Sep 2022 14:59:10 +0000 (14:59 +0000)]
GPU Bondeds: Extract backend-agnostic part
File renamed, a few CUDA-specific bits removed.
Refs #3928
Andrey Alekseenko [Tue, 6 Sep 2022 08:54:43 +0000 (10:54 +0200)]
Merge hwe-release-2022 into main
Conflicts:
- CMakeLists.txt
- admin/gitlab-ci/api-client.matrix/gromacs-master.gitlab-ci.yml
- admin/gitlab-ci/archive.gitlab-ci.yml
- admin/gitlab-ci/documentation.gitlab-ci.yml
- admin/gitlab-ci/gromacs.matrix/gromacs-gcc-9-cp2k-8.2-nightly.gitlab-ci.yml
- admin/gitlab-ci/gromacs.matrix/gromacs.clang-14-cuda-11.4.3.gitlab-ci.yml
- admin/gitlab-ci/gromacs.matrix/gromacs.clang-9-cuda-11.0-release.gitlab-ci.yml
- admin/gitlab-ci/gromacs.matrix/gromacs.clang-9.gitlab-ci.yml
- admin/gitlab-ci/gromacs.matrix/gromacs.gcc-11-coverage.gitlab-ci.yml
- admin/gitlab-ci/gromacs.matrix/gromacs.gcc-9-cuda-11.0.3-mpi.gitlab-ci.yml
- admin/gitlab-ci/gromacs.matrix/gromacs.gcc-9-cuda-11.0.3.gitlab-ci.yml
- admin/gitlab-ci/gromacs.matrix/gromacs.hipsycl-dev.gitlab-ci.yml
- admin/gitlab-ci/lint.gitlab-ci.yml
- admin/gitlab-ci/python-gmxapi.matrix/python-gmxapi-current-gromacs-mpi.gitlab-ci.yml
- api/gmxapi/cpp/tests/CMakeLists.txt
- api/gmxapi/cpp/workflow/tests/CMakeLists.txt
- cmake/gmxManageNvccConfig.cmake
- python_packaging/gmxapi/src/gmxapi/version.py
- src/gromacs/ewald/pme_coordinate_receiver_gpu_impl_gpu.cpp
- src/gromacs/ewald/pme_pp_comm_gpu_impl.cpp
- src/gromacs/gmxana/gmx_hbond.cpp
- src/gromacs/gmxpreprocess/readir.cpp
Manual updates:
- A bunch of GitLab YAML files to remove ':release-2020' suffix from images.
- src/gromacs/hardware/device_management_sycl_intel_device_ids.cpp
- src/gromacs/ewald/pme_force_sender_gpu_impl.h
- src/gromacs/ewald/pme_force_sender_gpu_impl_gpu.cu
Berk Hess [Tue, 6 Sep 2022 08:49:29 +0000 (08:49 +0000)]
Improve GPU pair search SIMD code
The GPU cluster pair distance SIMD kernel has been rewritten to
be templated and support normal SIMD width in addition to SIMD4.
This improves the performance on 8-wide SIMD platforms.
To enable this, the search coordinate packing order has been changed,
which also simpifies the packing code.
Paul Bauer [Tue, 6 Sep 2022 06:38:11 +0000 (08:38 +0200)]
Remove parenthesis for pragma unroll
Clang-14 complained about this.
Joe Jordan [Tue, 6 Sep 2022 06:17:13 +0000 (06:17 +0000)]
Make linearalgebra OBJECT and move files to include dir
Relates to #3288
Andrey Alekseenko [Mon, 5 Sep 2022 16:34:47 +0000 (16:34 +0000)]
CUDA Bondeds: simplify kernels, reduce shmem usage
We were not using `vtot` and `vtotVdw` simultaneously in the same warp, so we can reduce shmem and global atomic usage slightly.
We are only reducing within a warp, so we can reduce shmem usage even more by using shuffles instead.
Using V100, GCC 11.2, CUDA 11.5, benchMEM system: dynamic shmem went down from 636 to 540 bytes for all kernel flavors. But no differences in register usage or performance even for the energy kernel.
I'm a bit conflicted on this one because, on the one hand, why use more GPU resources than needed, but, on the other, perhaps we're better off not touching what's already good enough.
Berk Hess [Mon, 5 Sep 2022 15:21:11 +0000 (15:21 +0000)]
Increase the default values of nsttcouple and nstpcouple
The default values of nsttcouple and nstpcouple have been increased
from 10 to 100 to improve performance of GPU and parallel runs.
The mechanism to lower the values to ensure accurate integration
is now documented. Now also v-rescale uses at least 5 integration
steps per tau, as all other first order coupling algorithms.
Also set nstcalcenergy to 100 instead of 10 when the user supplied
value is -1 (note that the default mdp value is 100).
Fixes #4565
Closes #4565
Mark Abraham [Mon, 5 Sep 2022 08:00:25 +0000 (10:00 +0200)]
Simplify DeviceStreamManager setup and usage
havePpDomainDecomposition is now a field in SimulationWorkload, so the
constructor can get the value from there. It also now stores the value
so that bondedStream() can return the correct stream without needing
the caller to know how to coordinate in the correct way.
Alexey Shvetsov [Mon, 5 Sep 2022 13:27:09 +0000 (13:27 +0000)]
Replace C style headers with its C++ versions
Signed-off-by: Alexey Shvetsov <alexxyum@gmail.com>