Mark Abraham [Mon, 4 Dec 2017 10:56:46 +0000 (21:56 +1100)]
Fix free_gpu
If a device context was not used, CUDA gives an error if we attempt to
clear it, so we must avoid clearing it.
Refs #2322
Change-Id: I67b8b2d263eaed9c7489a6de6f612b27496cc6c2
Berk Hess [Wed, 29 Nov 2017 16:43:18 +0000 (17:43 +0100)]
Fixed initial temperature reporting
Fixes #2314
Change-Id: I13dec05ede9b4ad976c22b4910ee02256dcaac74
Mark Abraham [Mon, 4 Dec 2017 10:15:15 +0000 (21:15 +1100)]
Fix unused variables in CPU pruning kernels
aj is unused when UNROLLJ != STRIDE, so should not be declared
Change-Id: I889e47a3d62ad5644a96c6d05403b4b9285975e4
Berk Hess [Mon, 4 Dec 2017 06:55:31 +0000 (07:55 +0100)]
Update mdrun signal help text
Updated mdrun help text on signal handling for old and recent changes
to the behavior.
Fixes #2324
Change-Id: I48dd30b7da3a1dc57331978c7d3b0e1509850187
Berk Hess [Sun, 3 Dec 2017 21:15:20 +0000 (22:15 +0100)]
Only stop at nstlist steps with -reprod
Stopping mdrun with two INT or TERM signals would always happen right
after the first global communication step. But this breaks exact
continuation. Now with mdrun -reprod a second signal will still stop
at a pair-list generation step, like with the first signal, so we can
still have exact continuation.
Refs #2318
Change-Id: If65c1215d2509d60c1c5a6444769e7809288e798
Erik Lindahl [Wed, 29 Nov 2017 21:58:38 +0000 (22:58 +0100)]
Fix compilation issues for AVX-512
- gcc-5.4.0 incorrectly requires the second argument of
_mm512_i32gather_pd() to be a double pointer instead
of void, but this should fix compilation for both
cases.
- Work around double precision permute instruction
only available with AVX512VL instructions.
Fixes #2312.
Change-Id: I31420e71064b1c5c25c8af29a1d41c7f372375c1
Berk Hess [Sat, 2 Dec 2017 21:37:29 +0000 (22:37 +0100)]
Clear vsite velocities for simple integrators
The simple integrator loops (introduced in
69470fc4) do not clear
the velocities of virtual sites. This allows velocities of virtual
sites to slowly increase over time. To prevent this, velocities
of virtual sites are now cleared in a separate loop.
Fixes #2316
Change-Id: I12ff0fae2cd3c45ad4e63bfeccfc8c88505cdb1e
Mark Abraham [Sun, 3 Dec 2017 12:30:01 +0000 (23:30 +1100)]
Fix fft5d pinning
A CUDA build on a node with no driver installed can never have
selected a CUDA pinning policy, and erroneously unpinning leads to a
fatal error. Instead, FFT5D now remembers whether it made pinning
possible, which can only occur when there was a driver and a valid
device, so that it can unpin only when appropriate.
Removed some C++ guards and named a variable more precisely.
Noted the a TODO to make a Jenkins configuration to test this code
path.
Fixes #2322
Change-Id: I50ae9cdeeb26ac0d0bd5ecf48b28b44cf0716745
Berk Hess [Fri, 1 Dec 2017 13:03:11 +0000 (14:03 +0100)]
Tighten B-spline moduli single precision test tolerance from 6 to 1 ULP
Also get rid of the misused double precision tolerance helper.
Change-Id: I2babd8a7c5d4ab436e67a8b8d1ec0532a482ec94
Berk Hess [Thu, 30 Nov 2017 15:17:58 +0000 (16:17 +0100)]
Avoid assertion failure in AWH
With an unstable reaction coordinate or unequilibrated system, AWH
could cause an assertion to fail. Now AWH checks for valid coordinate
input and throws an exception with a clear message.
Change-Id: I059d9cd9fbff74fc096a9c1e4c16cf8d84b2118a
Viveca Lindahl [Thu, 30 Nov 2017 18:24:34 +0000 (19:24 +0100)]
Correct AWH input file name in documentation
Mdrun was expecting user input data file 'awhinit.xvg' while the
mdp-option documentation has 'awh-init'.xvg'.
Change-Id: I66a4957d58a7808213029bb33ad1ab69cacc304f
Berk Hess [Mon, 27 Nov 2017 21:02:50 +0000 (22:02 +0100)]
Change the GPU SMT cut-off to quadratic
The advantage of SMT diminishes rapidly with the number of cores.
So the system sizes should be compares to the square of the number
of cores.
Change-Id: I58f2efb3bb70b039452822bb18865c734ba52189
Erik Lindahl [Tue, 28 Nov 2017 18:14:23 +0000 (19:14 +0100)]
Fix AVX-512 SIMD test for C
Avoid using C++ features in the test, since
it should test both the C and C++ compilers.
Change-Id: Ia85d925faff87f39bc7ae34c8eac813bd45a7e37
Mark Abraham [Fri, 24 Nov 2017 20:43:38 +0000 (21:43 +0100)]
GROMACS 2018 first beta release
Change-Id: I153d464d62e99f429dd31accf21bae033da4b61c
Berk Hess [Tue, 12 Sep 2017 12:42:32 +0000 (14:42 +0200)]
Refactor nbnxn exclusion setting
Consolidate common parts of the simple and GPU exclusion mask
generation code. Made variable names more descriptive.
No functionality and performance changes, except that the direct
j-cluster lookup now also works when the first j-cluster does not
equal the i-cluster.
Change-Id: I3ef6344ae2796e649ae30bf5ff0668a4548c011f
Aleksei Iupinov [Tue, 28 Nov 2017 09:46:40 +0000 (10:46 +0100)]
Replace of pmeGPU variable name with pmeGpu
Sticking to one spelling is hard.
find ./src/gromacs/ewald -type f -exec sed -i 's/pmeGPU/pmeGpu/g' {} ';'
Change-Id: I80290027464343a034bfc0194fa2340663c2cbd0
Szilárd Páll [Tue, 28 Nov 2017 00:29:15 +0000 (01:29 +0100)]
Separate local GPU NB wait from emulation launch
Refactoring splits the combined nonbonded GPU and emulation path
conditional in preparation for the polling/alternating wait.
Change-Id: I685fb0ca5f72b92efb510d281c1078011f0c8b16
Szilárd Páll [Tue, 28 Nov 2017 00:08:15 +0000 (01:08 +0100)]
Refactor GPU rolling prune launch
Move the launch call and related logic to separate function.
Change-Id: Iba4d331fa0aa14b95a43d752d04ea3e4833667a2
Erik Lindahl [Fri, 24 Nov 2017 19:26:56 +0000 (20:26 +0100)]
Improve status messages for SIMD checks
The present SIMD checks always include a blank flag as the last
variable. This will (obviously) always work as a compiler flag,
and then we instead issued a status message saying a compiler
flag was found, but updating binutils might help. This alters
the error message to reflect that no flag was found, and we are
intentionally unspecific and suggest updating the compiler and/or
binutils.
Change-Id: If99aeac391adc709c295e71b94583a8f9e395b0e
Roland Schulz [Mon, 27 Nov 2017 19:46:00 +0000 (11:46 -0800)]
Fix bugs introduced in
c4cd996
Flags being appended every time to cache, causing endless loop
with cmake 3.6.3 and ninja 1.7.2. Appending not necessary. Only
case which is multi-step (AVX128/FMA) is anyhow appending
manual. Flags were appended by gmx_find_flags before
c4cd996
but the usage of it caused the flags to be appended in the
cache. This bug-fix changes gmx_find_flags rather than the
usage because functions shouldn't append outputs.
Change-Id: I0d9ae1edae937fe871f0085894858746778ffe0f
Szilárd Páll [Mon, 27 Nov 2017 01:40:00 +0000 (02:40 +0100)]
Time FFT/gather in mixed mode
Add separate cycle counters as the main PME ones are not practical as
they are treated in a special way, assumed to be part of the seaparately
accoutned for "PME mesh" section.
Change-Id: I189f256e7df24ff34420edfa8f6a3729709930fa
Aleksei Iupinov [Mon, 27 Nov 2017 15:04:41 +0000 (16:04 +0100)]
Do not force separate PME rank to recompute reciprocal box every step
Change-Id: I5015a13ecda6b14831619d7fb1e78fde4eb2002d
Erik Lindahl [Fri, 24 Nov 2017 19:45:24 +0000 (20:45 +0100)]
Test more AVX-512F instructions before enabling support
Test instructions that are not implemented on
earlier versions of clang, and fix a bug where the
KNL version of the avx test was called by mistake due
to similar names.
Change-Id: I47bbaf363feac9f6b34276295508e147d0c9e3e8
Aleksei Iupinov [Mon, 27 Nov 2017 14:20:51 +0000 (15:20 +0100)]
Disable failing separate PME rank tests
Change-Id: I2a954c766b06dcc99bf69c27fa1d56f275454adf
Aleksei Iupinov [Mon, 27 Nov 2017 09:50:23 +0000 (10:50 +0100)]
Add tests for PME tuning and fix mixed PME mode with tuning
Fixes #2303
Change-Id: Id9acd3d9b01b585d772b2009dcea9a9e0b8b0539
Berk Hess [Sat, 25 Nov 2017 08:57:53 +0000 (09:57 +0100)]
Enable auto thread pinning with thread limiting
Recently the possibility of automated limiting the number of OpenMP
threads was introduced (with PME on the GPU). Note that this could
already happen with very small systems. When not using all hardware
threads, pinning would be disabled and a warning was issued.
Now pinning is enabled an no warnings are issued when the user did
not specify any thread and pinning settings.
Changed "threads" in the mdrun -ntmpi description to "ranks".
Change-Id: I5f5688a4e2d35fdddbae3aeffae256158a13da5c
Paul Bauer [Mon, 13 Nov 2017 15:09:02 +0000 (16:09 +0100)]
Add GPU task information to user-guide
Started addition of information to the user guide concerning different
tasks that can be assigned to GPUs.
Change-Id: Ia83bf348620e51b0dac9bc186bde8b8a55479081
Berk Hess [Mon, 27 Nov 2017 07:51:54 +0000 (08:51 +0100)]
Fix PME GPU with walls
The unscaled box was used to compute the volume used by PME solve.
Change-Id: I631774ce1a237b5072f0df22741aaf6c116c8e4f
Berk Hess [Sat, 25 Nov 2017 20:46:05 +0000 (21:46 +0100)]
Fix compiler warning without MPI
Change-Id: I07675d1b5bd4ffbf8e4a29db9a95f721e3051ae5
Aleksei Iupinov [Fri, 24 Nov 2017 21:15:33 +0000 (22:15 +0100)]
Destroy PME GPU context
PME-only GPU rank change I82f3950b0e04b0bc21843a8124a9bd5c68b15024
forgot to do this.
Change-Id: I2f5deb35b33b5c8fb4fb15bd4f4fbd798838a87d
Aleksei Iupinov [Thu, 16 Nov 2017 17:07:05 +0000 (18:07 +0100)]
Enable separate PME GPU rank
This relaxes a few task assignment restrictions to allow
a simulation with a single separate PME GPU rank to work. E.g.
gmx mdrun -pme gpu -ntpmi 4 -npme 1
works if all the single rank PME GPU requirements are satisfied.
Default behavior is not changed, new behavior is opt-in.
The existing single-rank PME command line sanity test is altered
to also be included in multi-rank tests, testing a separate PME rank.
Change-Id: I82f3950b0e04b0bc21843a8124a9bd5c68b15024
Aleksei Iupinov [Thu, 23 Nov 2017 14:02:22 +0000 (15:02 +0100)]
Do not call mixed CPU+GPU PME mode "Hybrid"
Change-Id: I2739d366f819c1d50bed10f40e887ba571fa56e4
Aleksei Iupinov [Wed, 22 Nov 2017 09:52:40 +0000 (10:52 +0100)]
Activate mixed mode of PME on GPU
Thsi allows the mixed PME mode (spread/gather on GPU, FFT/solve on CPU)
to be triggered manually with -pme gpu -pmefft cpu. Behavior of other
values of -pme and -pmefft is not altered. Testing is enabled as well.
Change-Id: I2a525e2a7003a392629b2ca2ed4051d9245dac1b
Berk Hess [Fri, 24 Nov 2017 15:09:22 +0000 (16:09 +0100)]
Disabled CUDA timings
CUDA timings are incorrect with multiple streams and currently
we can not query for other streams in the non-bonded or PME module,
so disabled them by default. Added a GMX_ENABLE_GPU_TIMING env.var.
Removed deprecated env.vars.
Change-Id: I55ab98d7fea8fa90782e8346ad73b2d2a2171a1d
Szilárd Páll [Wed, 22 Nov 2017 02:09:00 +0000 (03:09 +0100)]
Add GPU sharing post-submit config
Also added some TODOs to the top of the list with the thinking that
these would be beneficial to test code that we're changing now (e.g.
memory management code).
Change-Id: Ic8ed571f571d6755fecb753d04a73f25f507ad22
Mark Abraham [Fri, 20 Nov 2015 19:43:58 +0000 (20:43 +0100)]
Update developer guide
Imported and updated more material from the wiki. Included coverage of
some recent discussion points on C++11 and preprocessor use.
Change-Id: I72dc1e656f329fe7de9feadc77de84bc39b6eb29
Mark Abraham [Fri, 3 Nov 2017 02:20:21 +0000 (03:20 +0100)]
Activate PME on GPUs
Adds support for a run with a single rank, running NB and PME on the
same GPU.
Fixed bug in thread-MPI gather and gatherv functions. Real MPI does
not require that the send buffer is non-NULL if the send count is
zero, and the thread-MPI docs have not documented that this is an
intended extra requirement. This would be a problem only if e.g.
memcpy(dest,src,0) itself dereferenced src.
TODO user docs
Change-Id: Iace9e720c0958eaacaaa81307f1fe7324fa4c9ac
Aleksei Iupinov [Fri, 24 Nov 2017 14:20:50 +0000 (15:20 +0100)]
Remove outdated (already implemented) pinning TODO
Change-Id: I122b93497aa168aeb354bd63059398c1738fd110
Berk Hess [Tue, 7 Nov 2017 11:23:09 +0000 (12:23 +0100)]
Speed up nbnxn buffer clearing
Change-Id: If1b58589aee4022cff9e6f120526b4e844967501
Mark Abraham [Fri, 3 Nov 2017 01:33:01 +0000 (02:33 +0100)]
Separate management of GPU contexts from modules
Tasks from modules might share GPU contexts across either tasks, or
thread-MPI ranks, so init and free operations can't be the
responsibility of the modules themselves.
Simplified the error reporting for init and free. Knowledge of the
rank ID might help in diagnosing issues in some cases, but should
(later) be the responsibility of a proper framework to catch errors
during initialization across MPI ranks.
Moved the GPU profiling cleanup back to where it was intended to be,
before some earlier refactoring had left it somewhere not-quite-right.
Change-Id: I682a1b1c7058cbebb41805dba05e688cbee18c2a
Erik Lindahl [Mon, 13 Nov 2017 17:48:05 +0000 (10:48 -0700)]
Detect AVX-512 FMA units to choose best SIMD
Add a test program that times AVX-512 code to
detect single vs. dual AVX-512 FMA units.
Added CMake code to always compile this file
with AVX-512 flags, both at CMake configuration
and runtime.
Tested to work on both AVX2 and AVX-512 hardware
with dual FMAs, and by manually faking single
FMA units, but when we get access to hardware
with a single AVX-512 FMA unit we need to check
that we produce the correct result.
Change-Id: I6240e864bc77f95085c5cd3303a84ab581eb3662
Mark Abraham [Thu, 2 Nov 2017 18:51:16 +0000 (19:51 +0100)]
Add PME integration test
As support for executing PME on GPUs is integrated, this test will
make a simple way to ensure things work as expected regardless of
what build type and hardware is present.
Change-Id: I2ab95b6f84eef9fe18b6858a8886221706706a14
Mark Abraham [Wed, 23 Aug 2017 14:06:44 +0000 (16:06 +0200)]
Extend task assignment code
Existing behaviour is largely unchanged, apart from some details of
how conditions that prevent task assignment are handled, and when.
However it is not feasible in the longer term to continue to implement
a way for gmx mdrun -gpu_id to imply the thread-MPI rank split, so
that is disabled now, along with a useful error message. Instead, for
both real and thread MPI, -gpu_id now limits the available GPU IDs
(issuing an error if there are any duplicates), somewhat like
CUDA_VISIBLE_DEVICES. The new mdrun -gputasks option specifies a full
GPU task assignment, and must be accompanied by a choice of ranks and
what kind of device recevies tasks of each type. Documentation is
updated accordingly.
Aspects of the implementation anticipate the extension to support
long-ranged PME interactions on GPUs, and others in future, so that
the task assignment on a node now takes the form of a container of
tasks, potentially of different types, on each rank of the node. A
flat vector of ints is no longer sufficient.
Errors e.g. from inconsistent user input are now handled with
exceptions, so that the runner can take the responsibility of
reporting those correctly, rather than always aborting the program at
the point where the issue is detected.
gmx tune_pme now explicitly only supports the new form of -gpu_id,
though it would not be difficult to support -gputasks if there
was need.
Change-Id: I0c149913bd43418d374171f5f95dad7f25d3cfe4
Aleksei Iupinov [Fri, 24 Nov 2017 10:55:16 +0000 (11:55 +0100)]
Mention gmx_install_headers() in the developer manual
Change-Id: I703e63b7e8978f23963107044396331cbfe94d7e
Aleksei Iupinov [Tue, 21 Nov 2017 17:20:40 +0000 (18:20 +0100)]
Make FFT real-space grid use CUDA pinning for mixed PME mode
Change-Id: Ic0decb839285cfffd0de6da667e99a08b201183a
Berk Hess [Thu, 23 Nov 2017 15:07:34 +0000 (16:07 +0100)]
Limit SMT with PME on GPU
For small numbers of atoms per core, SMT can seriously deteriorate
performance when running both non-bondeds and PME on GPU.
With fewer than 10000 atoms per core, SMT is now always off by default
with PME on GPU and auto settings.
Change-Id: I1a6b83bc81f68e89bf443e2b0ddb1fde44e2361d
Aleksei Iupinov [Mon, 20 Nov 2017 17:14:30 +0000 (18:14 +0100)]
Pin coordinates, forces and charges for PME on GPU
This pins almost all the required buffers for both PME-only and PP+PME
ranks. The only non-pinned buffer is the FFT grid for not yet enabled
mixed mode. ChargesA in md_atoms now check for PME actually being
used on the rank to avoid the unpin assertion.
Change-Id: Ia4c1b7673cc5c32c8d3d3fcf252d9d923b94128d
Berk Hess [Fri, 13 Oct 2017 12:15:55 +0000 (14:15 +0200)]
Update note in manual on SD
The comment in the SD section about Berendsen was outdated.
Added a few sentences on equilibration/damping of modes.
Change-Id: I12595e881572526637019879ff0bb8ef872e57d6
Berk Hess [Thu, 23 Nov 2017 13:28:31 +0000 (14:28 +0100)]
Correct PME GPU timing naming
Change-Id: I4a39d246cecbfc7a84608dd9c450d57393f1248d
Szilárd Páll [Thu, 23 Nov 2017 11:51:17 +0000 (12:51 +0100)]
Move GPU FFT/gather launch after GPU nonbondeds
Changing the scheduling order makes sure that the launch overhead of GPU
FFTs do not keep the CPU busy preventing nonbonded launch and its overlap
with PME.
Change-Id: I85cdb97cd7d7be6f5314da3a69d68616d8a93092
Pascal Merz [Tue, 28 Mar 2017 07:27:40 +0000 (01:27 -0600)]
Proof of concept: Physical validation suite
This is a proof of concept for a physical validation suite for GROMACS
(https://redmine.gromacs.org/issues/2070). It is kept separate from the
remaining code checks, for a number of reasons:
* Like the regression tests, it is separate from the code, i.e. it
calls the complete binary from an external script.
* Unlike the regression tests, we are not comparing to the results from
earlier version, but to what is expected physically / mathematically.
* Unlike the existing tests, we are not be able to keep these tests in the
"seconds, not minutes" time frame, rather aiming for "hours, not days".
They should therefore be ran periodically, but probably not for every
build.
Also, given the long run time, it will in many cases be necessary to
separate running of the systems (e.g. to run it at a specific time, or
on a different ressource), such that the make script does give the option
to
- prepare run files and an execution script,
- analyze already present simulations,
- or prepare, run and analyze in one go.
* Since those tests can not be ran at the same frequency as the current
tests, they are kept strictly opt-in via `-DGMX_PHYSICAL_VALIDATION=ON`
(see below for more details).
* Different tests need different simulation run-times - convergence tests
need significantly shorter simulations than statistical ensemble tests.
The tests could hence be ran at different frequency, with short tests
being integrated in the Jekyll environment, and longer tests being
triggered manually only.
USAGE WITH CMAKE / CTEST
* Run `cmake` in your build directory as usually.
* Run make:
* All previously existing build targets are unchanged, including
`make check`
* `make check-phys` builds the main binaries, then runs the physical
validation tests, expecting the simulation to have already ran.
* `make check-all` is the combination of `make check` and
`make check-phys`. It builds the main binaries and the test
targets, then runs the Google tests, the regression tests and the
physical validation tests.
* `make check-phys-prepare` builds the main binaries, then prepares
the validation tests by creating a subdirectory
tests/physicalvalidation in the build directory, which contains the
GROMACS input files as well as a bash file `run_simulations.sh` to
run the necessary simulations outside of the build environment.
* `make check-phys-run` prepares and runs the necessary simulations,
and runs the tests once the simulations are finished.
A typical sequence of commands to run the physical validation tests would
hence either be
> cmake -DGMX_PHYSICAL_VALIATION=ON ..
> make
> make check-prepare-phys
> # run simulations, possibly on external ressource
> make check-phys
or, to run everything locally (WARNING - can take several hours!)
> cmake -DGMX_PHYSICAL_VALIATION=ON ..
> make
> make check-phys-run
* Keeping the current `make check` target unchanged meant to do some
changes to the CMake files in the `tests/` directory.
* Note that currently `make check-phys-run` needs just under one hour
to run on my workstation (10-core Xeon processor)
ADDED FILES
All additional files are located in the `tests/physicalvalidation` folder
or on of its subfolders.
* The physical validation is done with a python script (python3, but
fully backwards compatible to 2.7), which can also be called
independently of Ctest. The script is called `gmx_physicalvalidation.py`
and is relatively extensively documented. Try running it with `-h` and
`--tests` to see the available options.
* The input files for the systems are stored in the `systems/` subfolder,
containing a .top file containing the topology, a .gro file containing
the starting structure, and a .mdp file containing the input options.
The .mdp files are kept as minimal as possible to be sensitive to changes
in GROMACS standard values.
* The physical validation script and the CMake files get informations about
the systems via the `system.json` (single precision) and `systems_d.json`
(double precision) files. These json-formatted files contain the system
names, the tests to be performed on the systems, as well as additional
options where needed.
* The actual computation of the tests is performed by a python package
stored in the `physical_validation` folder. This is simply a snapshot of
our code-independent physical validation package.
SYSTEMS:
Currently, the script is only running a few systems, checking convergence
of energy conservation in NVE system with decaying timestep, and the
ensembles generated by a few thermostating and barostating algorithms. We
have systems ready covering a broader combination of settings, but kept the
list shorter for this proof-of-concept to keep runtime short.
OPT-IN VIA `-DGMX_PHYSICAL_VALIDATION=ON`
* Physical validation tests are now strictly opt-in via
`-DGMX_PHYSICAL_VALIDATION=ON`, as discussed in the developer telco.
* If `-DGMX_PHYSICAL_VALIDATION=OFF` (default value),
* `make check`, `make test` and a direct call to `ctest` are
identical to before the inclusion of physical validation, i.e.
they run google tests and (if available) the regression tests.
* `make check-phys` does not run any tests, but issues a note that
noting will be checked since `-DGMX_PHYSICAL_VALIDATION=OFF`.
* `make check-all` is identical to `make check`, but issues a note
that physical validation is not available since
`-DGMX_PHYSICAL_VALIDATION=OFF`.
* If `-DGMX_PHYSICAL_VALIDATION=ON`,
* `make check` is identical to before the inclusion of physical
validation, i.e. it runs google tests and (if available) the
regression tests.
* `make test` and a direct call to `ctest` run all available tests,
i.e. google tests, possible regression tests, physical validation
tests.
* `make check-phys` runs only the physical validation tests.
* `make check-all` runs all available tests like `make test`, with
the difference that the tests and all dependencies are built
before running the tests.
Since `GMX_PHYSICAL_VALIDATION` is `OFF` by default, existing setups
should see no change in behavior.
Change-Id: I3155da64b7dfb43a42236030dce1399d36cb9655
Viveca Lindahl [Wed, 22 Nov 2017 10:15:50 +0000 (11:15 +0100)]
Fixed typo and adds corrects units in the friction tensor equation.
The friction should have units of energy*time/length^2.
Added unit of beta makes the units correct.
Change-Id: I9457e4724cefe87258ccc7f663982b349d4d4219
Viveca Lindahl [Tue, 21 Nov 2017 15:58:15 +0000 (16:58 +0100)]
Added friction tensor to AWH manual section
This also changes the lambda dimensional index from 'd' to 'mu'
for the whole AWH section.
Also fixes a typo.
Change-Id: I09203d7eabcb23482476c6d16c8dd75f5832a662
Mark Abraham [Thu, 9 Nov 2017 00:23:01 +0000 (17:23 -0700)]
Use much less PaddedRVecVector and more ArrayRef of RVec
Only code that handles allocations needs to know the concrete type of
the container. In some cases that do need the container type,
templating on the allocator will be needed in future, so that is
arranged here. This prepares for changing the allocator for state->x
so that we can use one that can be configured at run time for
efficient GPU transfers.
Also introduced PaddedArrayRef to use in code that relies on the
padding and/or alignedness attributes of the PaddedRVecVector. This
keeps partial type safety, although a proper implementation of such a
view should replace the current typedef.
Had to make some associate changes to helper functionality to
use more ArrayRef, rather than rely on the way rvec pointers could
decay to real pointers.
Used some compat::make_unique since that is better style.
Change-Id: I1ed3feb016727665329e919433bece9773b46969
Aleksei Iupinov [Tue, 21 Nov 2017 11:02:24 +0000 (12:02 +0100)]
Fix CUDA release build warning
Change-Id: I840c991f4eef394071626bfdcfcd14192e97a778
Aleksei Iupinov [Tue, 21 Nov 2017 07:19:26 +0000 (08:19 +0100)]
Fix pinned memory status checker tests
Change-Id: I69df298c5c022a21fe7e96d201adfe1d4dfb8563
Erik Lindahl [Sat, 11 Nov 2017 23:04:20 +0000 (16:04 -0700)]
Separate SIMD flag detection from management
This provides separate cmake functions to detect the flags
required to compile each SIMD instruction set. These
functions do not automatically add the SIMD flags to the
default flags, in order to make it possible to detect flags
for many SIMD instruction sets and use different flags for
different files.
Change-Id: I469a0cf0fafe3793d14dbcf0e72fd18f44a3bfe8
Mark Abraham [Sat, 11 Nov 2017 19:50:20 +0000 (12:50 -0700)]
Support pinning in HostAllocator
We want the resize / reserve behaviour to handle page locking that is
useful for efficient GPU transfer, while making it possible to avoid
locking more pages than required for that vector. By embedding the
pin()/unpin() behaviour into malloc() and free() for the allocation
policy, this can be safely handled in all cases.
Additionally, high-level code can now choose for any individual vector
when and whether a pinning policy is required, and even manually
pin and unpin in any special cases that might arise.
When using the policy that does not support pinning, we now use
AlignedAllocator, so that we minimize memory consumption.
Change-Id: I807464222c7cc7718282b1e08204f563869322a0
Viveca Lindahl [Mon, 28 Mar 2016 12:34:35 +0000 (14:34 +0200)]
Add force correlation to AWH module
This change adds the calculation of force correlation statistics
during an AWH biased simulation.
The main quantity of interest is the time-integrated force
correlation, also known as the friction tensor
(see e.g. http://dx.doi.org/10.1103/PhysRevLett.108.190602).
The friction tensor defines a metric on the coordinate space and
the local volume element of this metric is a useful measure for
determining which regions need more or less sampling.
gmx awh prints the friction (tensor) and can also still process
energy files without friction data.
Change-Id: I164be4665004dea5b250e3c7ac135ac1c1cbd783
Paul Bauer [Wed, 11 Oct 2017 14:19:22 +0000 (16:19 +0200)]
First general additions to the documentation
Made a general pass through the documentation and fixed a number
of issues I found with links not being where they should be
or pointing nowhere.
Also gave it a try to parse the log files from the linkchecker
script in the build test.
Change-Id: I29fabd0b824d90f205c13855a85b139f7b789da3
Viveca Lindahl [Mon, 28 Mar 2016 12:31:42 +0000 (14:31 +0200)]
Add reading and writing to AWH module
This change adds IO to the AWH module. AWH writes coordinate
free energies and distributions to an energy file block. The
reading is handled by a new tool gmx awh.
Change-Id: Ie30991bca376c2a648371db771fc5dfd8fca3715
Berk Hess [Mon, 23 Oct 2017 07:39:32 +0000 (09:39 +0200)]
Prevent PME tuning excessive grid scaling
We limit the maximum grid scaling to a factor 1.8. This allows
plenty of room for shifting work from PME on CPU to short-range
GPU kernels, but avoids excessive scaling for diminishing return
in performance for a significant increase in power consumption,
communication volume (which may with fluctuating network load not
show up during tuning) as well as limiting load balancing.
Change-Id: I85c02478faa6b67c063b6e1b45a9ac1755b2d81e
Mark Abraham [Sun, 19 Nov 2017 21:50:03 +0000 (22:50 +0100)]
Fix nightly matrix config
Since we added cmake warning detection, this has been failing, because
the older clang versions do not have openmp support.
Change-Id: Idbc47fd9ce3c5f329fb20ac36da48ea932f6ac03
Berk Hess [Sun, 12 Nov 2017 12:10:54 +0000 (13:10 +0100)]
Add SIMD for AWH
When using the convolved potential with AWH, a large number of double
precision exp() functions need to be evaluated at every step. These
are now SIMD accelerated.
Change-Id: If1e3a916469c4fd7e26740123009ae59b7927667
Viveca Lindahl [Sun, 12 Nov 2017 21:46:40 +0000 (22:46 +0100)]
Add AWH section to manual.
Brief theoretical background and practical aspects of AWH.
Change-Id: Ic794006b00b107b6c9d22a1cfabff4646c87ae4e
Viveca Lindahl [Mon, 28 Mar 2016 12:27:33 +0000 (14:27 +0200)]
Add AWH biasing module + tests
The AWH (Accelerated weight histogram) method is an adaptive biasing
method used for overcoming free energy barriers and calculating
free energies (see http://dx.doi.org/10.1063/1.
4890371). Although
AWH can in general bias any system parameter, this change only
implements biasing of reaction coordinates. The actual force
distribution and coordinate handling is taken care of by the pull
code. AWH interacts with the pull code by registering itself as
the external potential module for the coordinate that should be
AWH biased. The AWH code sets the potential and force for those
coordinates.
The Grid test checks that the neighborhoods are correct.
The Bias tests check the force, bias and free energy values
for the final and initial phase, with MC and convolved force
and without and with skipping updates.
Change-Id: I202f58f7042e8e63c9d708fdcaca6da7e8a4022e
Mark Abraham [Tue, 24 Oct 2017 15:40:58 +0000 (17:40 +0200)]
Fix FindCUDA.cmake for -Wundef
CUDA 9.0 issues large numbers of -Wundef warnings from its internal
headers. FindCUDA.cmake should be including such headers as
"system" headers, so it is modified to do that.
Fixes #2276
Change-Id: I36ca1208a0597215bf2f1e38d849786e9ac2fed4
Magnus Lundborg [Tue, 14 Nov 2017 11:06:09 +0000 (12:06 +0100)]
Added Magnus Lundborg as contributor in the manual.
I noticed I was not listed so I shamelessly added myself.
Change-Id: Idf589cabc319dab20abb1a40d73ce14692e71de2
Aleksei Iupinov [Tue, 7 Nov 2017 12:40:17 +0000 (13:40 +0100)]
Asynchronous CUDA transfers are asserted to always use pinned memory
A few tests are also added for the memory status checker function.
Change-Id: Ifd6c9b2a0ad8ad8c325dc553534314e81e5d8bd7
Mark Abraham [Wed, 8 Nov 2017 11:25:41 +0000 (12:25 +0100)]
Use HostAllocationPolicy for chargeA
Change-Id: Id230a0571ff8e4f8f69e347a3aea4e3c44413676
Mark Abraham [Fri, 17 Nov 2017 18:58:39 +0000 (11:58 -0700)]
Add checking function for whether a buffer is pinned
This is useful for several kinds of tests proposed.
Change-Id: If9fdd29e73f16299190b5485f473f6388aab9ec9
Mark Abraham [Mon, 6 Nov 2017 08:28:01 +0000 (09:28 +0100)]
Improve handling of PME GPU force buffer
Managed it with the HostAllocator, and moved the responsibility
for its lifetime to the PME GPU staging structure. The buffer
does not use CUDA pinning yet.
Change-Id: Ia6fdbdb2509137fec1c6cf2a4ac8c04b1696b58f
Aleksei Iupinov [Fri, 10 Nov 2017 15:04:35 +0000 (16:04 +0100)]
Make gpu_utils-test build with GMX_CLANG_CUDA
Same workarounds are applied to libgpu_utilstest as for libgromacs.
Renamed ligbpu_utilstest target to gpu_utilstest_cuda to avoid the
double "lib" prefix in the filename.
Refs #2259, #2293
Change-Id: I16b07a13ce2dca30079a889e2b314483d82d3674
Berk Hess [Tue, 31 Oct 2017 10:03:02 +0000 (11:03 +0100)]
Also print 1x1 pair-list setup to log
mdrun now prints the equivalent 1x1 pair-list setup in addtion
to the NxM list setup. This is to clarify that we can use short
pair list buffers because of our cluster setup.
The list setup is now also printed in case we have a single list.
Removed the note on needing to increase nstlist with a GPU when
we automatically change nstlist.
Changed pick_nbnxn_kernel and nbnxn_atomdata_init to use mdlog
to get correct spacing between paragraphs.
Also cleaned up the verletbuf list setup getter functions.
Change-Id: Ic7b5967b0a62aee9fee9837f60a134fd571ff405
Aleksei Iupinov [Thu, 9 Nov 2017 18:01:43 +0000 (19:01 +0100)]
Rename and expose "generic" GPU memory transfer functions
Dropped the "_generic" suffix from the names. Made the sync/async
argument an enum class instead of boolean.
Made PME use synchronous versions of the functions for unit tests.
Change-Id: I5fd2490d58370d9f0405aea1a74237fa8107cbab
Erik Lindahl [Sun, 12 Nov 2017 13:14:10 +0000 (06:14 -0700)]
Only issue FFT warning messages on changes
Similar to other CMake modules, we should only issue
warnings at the first invocation, or if the FFT library
was changed.
Change-Id: I6dba59f1021984d9a744a55d797814c1c9d89b20
Roland Schulz [Sat, 8 Jul 2017 00:40:48 +0000 (17:40 -0700)]
PME-gather: 4xN SIMD
Speedup on KNL 11% for spread/gather (3% total) on ion-channel
Change-Id: I1a0624408b4e8f7bd441dfe2c260f80d211351d0
Mark Abraham [Tue, 24 Oct 2017 15:40:58 +0000 (17:40 +0200)]
Import cmake Modules/FindCUDA.cmake
CUDA 9.0 issues large numbers of -Wundef warnings from its internal
headers. FindCUDA.cmake should be including such headers as "system"
headers, so to prepare for a patch where it is modified to do that,
this commit imports that file from v3.4.3 of the CMake repository,
because that is a choice likely to work with all future versions of
CMake.
It needs some supporting cmake files that are included unmodified,
so GROMACS does not assert copyright on those. The main FindCUDA.cmake
file is modified only to be able to find those files
Refs #2276
Change-Id: I69ad39dc805648a6cc5e27bb7fcd229f5f2a538a
Roland Schulz [Thu, 9 Nov 2017 22:43:45 +0000 (14:43 -0800)]
Rename load1DualHsimd to loadU1DualHsimd
Documentation didn't require any alignment, test didn't use
alignment and all implementations didn't require any alignment.
But name suggested that alignment is required.
Only current usage had 2-wide alignment but requiring that
would make the function less general without any advantage.
Change-Id: I651c1327a3febc368cb4b039ad226d0771770e60
Roland Schulz [Thu, 9 Nov 2017 23:02:41 +0000 (15:02 -0800)]
AVX: Improve load1DualHsimd
instr+uop: 4->3, throughput/port-pressure(on 5): 3->1
(IACA numbers for IVB-SKL)
Change-Id: Id768cb951dcbace1473448fcd63fa7d40b0e7da6
Roland Schulz [Fri, 10 Nov 2017 02:08:08 +0000 (03:08 +0100)]
Revert "Use -mavx2 -mfma instead of -march with AVX2"
This reverts commit
062a6b81498b61b2bfc4ec7441b844d76aae445b.
Reason for revert: Breaks support for ICC (16-18) which doesn't have -mavx2 or -mfma.
Change-Id: I01cf3e9db332a405fd9419b6382240f5fcecf633
Aleksei Iupinov [Thu, 9 Nov 2017 18:07:30 +0000 (19:07 +0100)]
Rename synchronous GPU transfer functions to match the asynchronous ones
Change-Id: I5cb8e9cab208c1d0c62f985ec3140540ea427fb2
Mark Abraham [Wed, 8 Nov 2017 11:22:57 +0000 (12:22 +0100)]
Prepared t_mdatoms for using vector
Wrapped it in another C++ class because the group-scheme kernels
compile as plain C and this permits the contained t_mdatoms to
be unmodified. The class has responsibility for maintaining the
allocations for any of the fields of t_mdatoms that need to be
managed with a std::vector plus perhaps an allocator.
Change-Id: I6fef70beeb8d43f3e048cec02380f8ebf8153ecb
Mark Abraham [Mon, 6 Nov 2017 07:45:49 +0000 (08:45 +0100)]
Introduce HostAllocationPolicy
This permits host-side standard containers and smart pointers to have
their contents placed in memory suitable for efficient GPU transfer.
The behaviour can be configured at run time during simulation setup,
so that if we are not running on a GPU, then none of the buffers that
might be affected actually are. The downside is that all such
containers now have state.
Change-Id: I9367d0f996de04c21312cef2081cc08148f80561
Roland Schulz [Mon, 30 Oct 2017 18:33:33 +0000 (11:33 -0700)]
ICC should use ZMM if code anyhow uses ZMM
Change-Id: Iaea73df12065b3d4ba1974e48b864f44c9b7fe44
Roland Schulz [Wed, 25 Oct 2017 19:24:42 +0000 (12:24 -0700)]
Fix scalar blend
Change-Id: I580af279cdba494ec13029259e4fd0867a7e5ea2
Magnus Lundborg [Mon, 23 Oct 2017 11:20:00 +0000 (13:20 +0200)]
Update to TNG v 1.8.1
Fixes #2187 and #2250.
Change-Id: Icf81d5f3ce916e984750e1511d32e16ebc45b6f9
Szilárd Páll [Wed, 18 Oct 2017 15:01:54 +0000 (17:01 +0200)]
Use -mavx2 -mfma instead of -march with AVX2
This was (likely) only a workaround for some early gcc version that did
not support correct AVX2 code-generation with just the -mavx2 -mfma
flags. However, just as with other SIMD flavors with AVX2 too we should
not request arch-specific tuning just to get the desired SIMD flavor
enabled.
Change-Id: Ib0c6388bebcffbf0719b438451d3943f51fba4a4
Mark Abraham [Tue, 7 Nov 2017 02:26:32 +0000 (03:26 +0100)]
Reform gmx_pme_pp alloc and use vector
Introduced a helper struct for describing the partner PP ranks.
Reduced some of the conditional compilation.
Updated some naming from node to rank.
Fixed over-use of charge_pp.
Change-Id: I00b59dd116740721ed707af4242c0d44f1615d56
Mark Abraham [Mon, 6 Nov 2017 07:43:24 +0000 (08:43 +0100)]
Introduce gmxopencl.h
This header wraps the different ways to include the main OpenCL header
on different platforms, including suppressions for the warnings about
usage of deprecated API elements. NVIDIA only official supports the
version with the deprecated elements, so we need to continue to use it.
Change-Id: Ie24f20d43272e1747bcbd693815e96cc200d5f50
Szilárd Páll [Fri, 20 Oct 2017 20:26:25 +0000 (22:26 +0200)]
Merge common nbnxn CUDA/OpenCL GPU wait code-paths
The entire GPU wait including timing accumulation as well as staging
data reducion of the nonbonded GPU modules has been unified by
including a single templated version of the code into the common header.
Code has only been moved and changed in minor ways when necessary (e.g.
for the rvec reduction).
Change-Id: Ic9c9690be58a78f92ca99d2af30068e19c19cc6c
Mark Abraham [Tue, 10 Oct 2017 10:09:10 +0000 (10:09 +0000)]
Test clang on ARM in nightly matrix
Also suppress lots of compiler warnings from useless use of
__vectorcall on this target for this compiler.
ARM are targetting clang for future development, so hopefully this
either isn't needed or will work in future. Either way, this change
will continue to do the right thing.
Change-Id: I211952a24aefee8434cc6b32322f359b2a22687b
Szilárd Páll [Mon, 23 Oct 2017 14:11:46 +0000 (16:11 +0200)]
Add wallcycle timer for the PME GPU F reduction
Change-Id: I85185f2acdf3ebdcbac109ef723eb458bc0e9008
Szilárd Páll [Fri, 20 Oct 2017 17:52:13 +0000 (19:52 +0200)]
Split off nbnxn GPU timing and staging reduction
Code reorganization that moves the timing related functions as well as
energy and shift force reduction into separate functions in both CUDA
and OpenCL versions of nbnxn_gpu_wait_for_gpu().
Change-Id: Ic5c9694d9de7f80a772e97f5c9e05bab77a3b82a
Mark Abraham [Tue, 7 Nov 2017 01:52:18 +0000 (02:52 +0100)]
Improve PME includes
Changing an internal ewald-module header for GPU support should not
lead to files outside that module needing to be recompiled. Moved enum
declarations for use outside the module to the header file that
declares such things. Restored necessary includes that were being
satisfied transitively from the internal header, that were prematurely
removed in
fae8902688dc48be56e.
Change-Id: I18c3146e80aba9ad0a2c485f2355bc214cbb083c
Szilárd Páll [Fri, 20 Oct 2017 18:55:45 +0000 (20:55 +0200)]
Deduplicate CUDA and OpenCL timer struct
The struct is identical in both CUDA/OpenCL so it's better placed in a
common header, but this needs to be an internal-only header as it pulls
in CUDA dependencies.
Change-Id: I907d68b7c298f2ba0e7a1af2baf4819f637e2f2e
David van der Spoel [Thu, 12 Oct 2017 07:06:44 +0000 (09:06 +0200)]
Fixed check for water in gen_vsite.cpp
Pdb2gmx would break when generating virtual sites if water oxygens
were not named OW. Now checking for the atomnumber instead.
Fixes #2268
Change-Id: I326f683e4940ad02351dcbe0c00e266a82b203f6
Mark Abraham [Fri, 3 Nov 2017 01:05:42 +0000 (02:05 +0100)]
Merge "Merge branch release-2016"
Berk Hess [Wed, 1 Nov 2017 16:21:48 +0000 (17:21 +0100)]
Fix Ekin at step 0 with COM removal
The kinetic energy at step 0 was computed from the velocities without
the center of mass velocity removed. This could cause a relatively
large jump in kinetic energy, especially for small systems.
Now compute_globals is called twice with COM removal so we get
the correct kinetic energy.
Appropriate mdrun tests for energy-conserving integrators are also added.
Change-Id: I87ab08d21a35621735ab3c65fc50af9992120be3
David van der Spoel [Tue, 31 Oct 2017 12:25:56 +0000 (13:25 +0100)]
New mdp input for electric fields.
New format for MDP input for electric fields that is consistent
with the manual and that is comprehensible.
Change-Id: I5f9f434080f5217d2473c16377aee962692b9ee9