BioD PNPI Git Repos - alexxy/gromacs.git/log

biod.pnpi.spb.ru / alexxy / gromacs.git / log

Mark Abraham [Mon, 4 Dec 2017 10:56:46 +0000 (21:56 +1100)]

Fix free_gpu

If a device context was not used, CUDA gives an error if we attempt to
clear it, so we must avoid clearing it.

Refs #2322

Change-Id: I67b8b2d263eaed9c7489a6de6f612b27496cc6c2

commit | commitdiff | tree

Berk Hess [Wed, 29 Nov 2017 16:43:18 +0000 (17:43 +0100)]

Fixed initial temperature reporting

Fixes #2314

Change-Id: I13dec05ede9b4ad976c22b4910ee02256dcaac74

commit | commitdiff | tree

Mark Abraham [Mon, 4 Dec 2017 10:15:15 +0000 (21:15 +1100)]

Fix unused variables in CPU pruning kernels

aj is unused when UNROLLJ != STRIDE, so should not be declared

Change-Id: I889e47a3d62ad5644a96c6d05403b4b9285975e4

commit | commitdiff | tree

Berk Hess [Mon, 4 Dec 2017 06:55:31 +0000 (07:55 +0100)]

Update mdrun signal help text

Updated mdrun help text on signal handling for old and recent changes
to the behavior.

Fixes #2324

Change-Id: I48dd30b7da3a1dc57331978c7d3b0e1509850187

commit | commitdiff | tree

Berk Hess [Sun, 3 Dec 2017 21:15:20 +0000 (22:15 +0100)]

Only stop at nstlist steps with -reprod

Stopping mdrun with two INT or TERM signals would always happen right
after the first global communication step. But this breaks exact
continuation. Now with mdrun -reprod a second signal will still stop
at a pair-list generation step, like with the first signal, so we can
still have exact continuation.

Refs #2318

Change-Id: If65c1215d2509d60c1c5a6444769e7809288e798

commit | commitdiff | tree

Erik Lindahl [Wed, 29 Nov 2017 21:58:38 +0000 (22:58 +0100)]

Fix compilation issues for AVX-512

- gcc-5.4.0 incorrectly requires the second argument of
  _mm512_i32gather_pd() to be a double pointer instead
  of void, but this should fix compilation for both
  cases.
- Work around double precision permute instruction
  only available with AVX512VL instructions.

Fixes #2312.

Change-Id: I31420e71064b1c5c25c8af29a1d41c7f372375c1

commit | commitdiff | tree

Berk Hess [Sat, 2 Dec 2017 21:37:29 +0000 (22:37 +0100)]

Clear vsite velocities for simple integrators

The simple integrator loops (introduced in 69470fc4) do not clear
the velocities of virtual sites. This allows velocities of virtual
sites to slowly increase over time. To prevent this, velocities
of virtual sites are now cleared in a separate loop.

Fixes #2316

Change-Id: I12ff0fae2cd3c45ad4e63bfeccfc8c88505cdb1e

commit | commitdiff | tree

Mark Abraham [Sun, 3 Dec 2017 12:30:01 +0000 (23:30 +1100)]

Fix fft5d pinning

A CUDA build on a node with no driver installed can never have
selected a CUDA pinning policy, and erroneously unpinning leads to a
fatal error. Instead, FFT5D now remembers whether it made pinning
possible, which can only occur when there was a driver and a valid
device, so that it can unpin only when appropriate.

Removed some C++ guards and named a variable more precisely.

Noted the a TODO to make a Jenkins configuration to test this code
path.

Fixes #2322

Change-Id: I50ae9cdeeb26ac0d0bd5ecf48b28b44cf0716745

commit | commitdiff | tree

Berk Hess [Fri, 1 Dec 2017 13:03:11 +0000 (14:03 +0100)]

Tighten B-spline moduli single precision test tolerance from 6 to 1 ULP

Also get rid of the misused double precision tolerance helper.

Change-Id: I2babd8a7c5d4ab436e67a8b8d1ec0532a482ec94

commit | commitdiff | tree

Berk Hess [Thu, 30 Nov 2017 15:17:58 +0000 (16:17 +0100)]

Avoid assertion failure in AWH

With an unstable reaction coordinate or unequilibrated system, AWH
could cause an assertion to fail. Now AWH checks for valid coordinate
input and throws an exception with a clear message.

Change-Id: I059d9cd9fbff74fc096a9c1e4c16cf8d84b2118a

commit | commitdiff | tree

Viveca Lindahl [Thu, 30 Nov 2017 18:24:34 +0000 (19:24 +0100)]

Correct AWH input file name in documentation

Mdrun was expecting user input data file 'awhinit.xvg' while the
mdp-option documentation has 'awh-init'.xvg'.

Change-Id: I66a4957d58a7808213029bb33ad1ab69cacc304f

commit | commitdiff | tree

Berk Hess [Mon, 27 Nov 2017 21:02:50 +0000 (22:02 +0100)]

Change the GPU SMT cut-off to quadratic

The advantage of SMT diminishes rapidly with the number of cores.
So the system sizes should be compares to the square of the number
of cores.

Change-Id: I58f2efb3bb70b039452822bb18865c734ba52189

commit | commitdiff | tree

Erik Lindahl [Tue, 28 Nov 2017 18:14:23 +0000 (19:14 +0100)]

Fix AVX-512 SIMD test for C

Avoid using C++ features in the test, since
it should test both the C and C++ compilers.

Change-Id: Ia85d925faff87f39bc7ae34c8eac813bd45a7e37

commit | commitdiff | tree

Mark Abraham [Fri, 24 Nov 2017 20:43:38 +0000 (21:43 +0100)]

GROMACS 2018 first beta release

Change-Id: I153d464d62e99f429dd31accf21bae033da4b61c

commit | commitdiff | tree

Berk Hess [Tue, 12 Sep 2017 12:42:32 +0000 (14:42 +0200)]

Refactor nbnxn exclusion setting

Consolidate common parts of the simple and GPU exclusion mask
generation code. Made variable names more descriptive.
No functionality and performance changes, except that the direct
j-cluster lookup now also works when the first j-cluster does not
equal the i-cluster.

Change-Id: I3ef6344ae2796e649ae30bf5ff0668a4548c011f

commit | commitdiff | tree

Aleksei Iupinov [Tue, 28 Nov 2017 09:46:40 +0000 (10:46 +0100)]

Replace of pmeGPU variable name with pmeGpu

Sticking to one spelling is hard.

find ./src/gromacs/ewald -type f -exec sed -i 's/pmeGPU/pmeGpu/g' {} ';'

Change-Id: I80290027464343a034bfc0194fa2340663c2cbd0

commit | commitdiff | tree

Szilárd Páll [Tue, 28 Nov 2017 00:29:15 +0000 (01:29 +0100)]

Separate local GPU NB wait from emulation launch

Refactoring splits the combined nonbonded GPU and emulation path
conditional in preparation for the polling/alternating wait.

Change-Id: I685fb0ca5f72b92efb510d281c1078011f0c8b16

commit | commitdiff | tree

Szilárd Páll [Tue, 28 Nov 2017 00:08:15 +0000 (01:08 +0100)]

Refactor GPU rolling prune launch

Move the launch call and related logic to separate function.

Change-Id: Iba4d331fa0aa14b95a43d752d04ea3e4833667a2

commit | commitdiff | tree

Erik Lindahl [Fri, 24 Nov 2017 19:26:56 +0000 (20:26 +0100)]

Improve status messages for SIMD checks

The present SIMD checks always include a blank flag as the last
variable. This will (obviously) always work as a compiler flag,
and then we instead issued a status message saying a compiler
flag was found, but updating binutils might help. This alters
the error message to reflect that no flag was found, and we are
intentionally unspecific and suggest updating the compiler and/or
binutils.

Change-Id: If99aeac391adc709c295e71b94583a8f9e395b0e

commit | commitdiff | tree

Roland Schulz [Mon, 27 Nov 2017 19:46:00 +0000 (11:46 -0800)]

Fix bugs introduced in c4cd996

Flags being appended every time to cache, causing endless loop
with cmake 3.6.3 and ninja 1.7.2. Appending not necessary. Only
case which is multi-step (AVX128/FMA) is anyhow appending
manual. Flags were appended by gmx_find_flags before c4cd996
but the usage of it caused the flags to be appended in the
cache. This bug-fix changes gmx_find_flags rather than the
usage because functions shouldn't append outputs.

Change-Id: I0d9ae1edae937fe871f0085894858746778ffe0f

commit | commitdiff | tree

Szilárd Páll [Mon, 27 Nov 2017 01:40:00 +0000 (02:40 +0100)]

Time FFT/gather in mixed mode

Add separate cycle counters as the main PME ones are not practical as
they are treated in a special way, assumed to be part of the seaparately
accoutned for "PME mesh" section.

Change-Id: I189f256e7df24ff34420edfa8f6a3729709930fa

commit | commitdiff | tree

Aleksei Iupinov [Mon, 27 Nov 2017 15:04:41 +0000 (16:04 +0100)]

Do not force separate PME rank to recompute reciprocal box every step

Change-Id: I5015a13ecda6b14831619d7fb1e78fde4eb2002d

commit | commitdiff | tree

Erik Lindahl [Fri, 24 Nov 2017 19:45:24 +0000 (20:45 +0100)]

Test more AVX-512F instructions before enabling support

Test instructions that are not implemented on
earlier versions of clang, and fix a bug where the
KNL version of the avx test was called by mistake due
to similar names.

Change-Id: I47bbaf363feac9f6b34276295508e147d0c9e3e8

commit | commitdiff | tree

Aleksei Iupinov [Mon, 27 Nov 2017 14:20:51 +0000 (15:20 +0100)]

Disable failing separate PME rank tests

Change-Id: I2a954c766b06dcc99bf69c27fa1d56f275454adf

commit | commitdiff | tree

Aleksei Iupinov [Mon, 27 Nov 2017 09:50:23 +0000 (10:50 +0100)]

Add tests for PME tuning and fix mixed PME mode with tuning

Fixes #2303

Change-Id: Id9acd3d9b01b585d772b2009dcea9a9e0b8b0539

commit | commitdiff | tree

Berk Hess [Sat, 25 Nov 2017 08:57:53 +0000 (09:57 +0100)]

Enable auto thread pinning with thread limiting

Recently the possibility of automated limiting the number of OpenMP
threads was introduced (with PME on the GPU). Note that this could
already happen with very small systems. When not using all hardware
threads, pinning would be disabled and a warning was issued.
Now pinning is enabled an no warnings are issued when the user did
not specify any thread and pinning settings.

Changed "threads" in the mdrun -ntmpi description to "ranks".

Change-Id: I5f5688a4e2d35fdddbae3aeffae256158a13da5c

commit | commitdiff | tree

Paul Bauer [Mon, 13 Nov 2017 15:09:02 +0000 (16:09 +0100)]

Add GPU task information to user-guide

Started addition of information to the user guide concerning different
tasks that can be assigned to GPUs.

Change-Id: Ia83bf348620e51b0dac9bc186bde8b8a55479081

commit | commitdiff | tree

Berk Hess [Mon, 27 Nov 2017 07:51:54 +0000 (08:51 +0100)]

Fix PME GPU with walls

The unscaled box was used to compute the volume used by PME solve.

Change-Id: I631774ce1a237b5072f0df22741aaf6c116c8e4f

commit | commitdiff | tree

Berk Hess [Sat, 25 Nov 2017 20:46:05 +0000 (21:46 +0100)]

Fix compiler warning without MPI

Change-Id: I07675d1b5bd4ffbf8e4a29db9a95f721e3051ae5

commit | commitdiff | tree

Aleksei Iupinov [Fri, 24 Nov 2017 21:15:33 +0000 (22:15 +0100)]

Destroy PME GPU context

PME-only GPU rank change I82f3950b0e04b0bc21843a8124a9bd5c68b15024
forgot to do this.

Change-Id: I2f5deb35b33b5c8fb4fb15bd4f4fbd798838a87d

commit | commitdiff | tree

Aleksei Iupinov [Thu, 16 Nov 2017 17:07:05 +0000 (18:07 +0100)]

Enable separate PME GPU rank

This relaxes a few task assignment restrictions to allow
a simulation with a single separate PME GPU rank to work. E.g.

gmx mdrun -pme gpu -ntpmi 4 -npme 1

works if all the single rank PME GPU requirements are satisfied.

Default behavior is not changed, new behavior is opt-in.

The existing single-rank PME command line sanity test is altered
to also be included in multi-rank tests, testing a separate PME rank.

Change-Id: I82f3950b0e04b0bc21843a8124a9bd5c68b15024

commit | commitdiff | tree

Aleksei Iupinov [Thu, 23 Nov 2017 14:02:22 +0000 (15:02 +0100)]

Do not call mixed CPU+GPU PME mode "Hybrid"

Change-Id: I2739d366f819c1d50bed10f40e887ba571fa56e4

commit | commitdiff | tree

Aleksei Iupinov [Wed, 22 Nov 2017 09:52:40 +0000 (10:52 +0100)]

Activate mixed mode of PME on GPU

Thsi allows the mixed PME mode (spread/gather on GPU, FFT/solve on CPU)
to be triggered manually with -pme gpu -pmefft cpu. Behavior of other
values of -pme and -pmefft is not altered. Testing is enabled as well.

Change-Id: I2a525e2a7003a392629b2ca2ed4051d9245dac1b

commit | commitdiff | tree

Berk Hess [Fri, 24 Nov 2017 15:09:22 +0000 (16:09 +0100)]

Disabled CUDA timings

CUDA timings are incorrect with multiple streams and currently
we can not query for other streams in the non-bonded or PME module,
so disabled them by default. Added a GMX_ENABLE_GPU_TIMING env.var.
Removed deprecated env.vars.

Change-Id: I55ab98d7fea8fa90782e8346ad73b2d2a2171a1d

commit | commitdiff | tree

Szilárd Páll [Wed, 22 Nov 2017 02:09:00 +0000 (03:09 +0100)]

Add GPU sharing post-submit config

Also added some TODOs to the top of the list with the thinking that
these would be beneficial to test code that we're changing now (e.g.
memory management code).

Change-Id: Ic8ed571f571d6755fecb753d04a73f25f507ad22

commit | commitdiff | tree

Mark Abraham [Fri, 20 Nov 2015 19:43:58 +0000 (20:43 +0100)]

Update developer guide

Imported and updated more material from the wiki. Included coverage of
some recent discussion points on C++11 and preprocessor use.

Change-Id: I72dc1e656f329fe7de9feadc77de84bc39b6eb29

commit | commitdiff | tree

Mark Abraham [Fri, 3 Nov 2017 02:20:21 +0000 (03:20 +0100)]

Activate PME on GPUs

Adds support for a run with a single rank, running NB and PME on the
same GPU.

Fixed bug in thread-MPI gather and gatherv functions. Real MPI does
not require that the send buffer is non-NULL if the send count is
zero, and the thread-MPI docs have not documented that this is an
intended extra requirement. This would be a problem only if e.g.
memcpy(dest,src,0) itself dereferenced src.

TODO user docs

Change-Id: Iace9e720c0958eaacaaa81307f1fe7324fa4c9ac

commit | commitdiff | tree

Aleksei Iupinov [Fri, 24 Nov 2017 14:20:50 +0000 (15:20 +0100)]

Remove outdated (already implemented) pinning TODO

Change-Id: I122b93497aa168aeb354bd63059398c1738fd110

commit | commitdiff | tree

Berk Hess [Tue, 7 Nov 2017 11:23:09 +0000 (12:23 +0100)]

Speed up nbnxn buffer clearing

Change-Id: If1b58589aee4022cff9e6f120526b4e844967501

commit | commitdiff | tree

Mark Abraham [Fri, 3 Nov 2017 01:33:01 +0000 (02:33 +0100)]

Separate management of GPU contexts from modules

Tasks from modules might share GPU contexts across either tasks, or
thread-MPI ranks, so init and free operations can't be the
responsibility of the modules themselves.

Simplified the error reporting for init and free. Knowledge of the
rank ID might help in diagnosing issues in some cases, but should
(later) be the responsibility of a proper framework to catch errors
during initialization across MPI ranks.

Moved the GPU profiling cleanup back to where it was intended to be,
before some earlier refactoring had left it somewhere not-quite-right.

Change-Id: I682a1b1c7058cbebb41805dba05e688cbee18c2a

commit | commitdiff | tree

Erik Lindahl [Mon, 13 Nov 2017 17:48:05 +0000 (10:48 -0700)]

Detect AVX-512 FMA units to choose best SIMD

Add a test program that times AVX-512 code to
detect single vs. dual AVX-512 FMA units.
Added CMake code to always compile this file
with AVX-512 flags, both at CMake configuration
and runtime.
Tested to work on both AVX2 and AVX-512 hardware
with dual FMAs, and by manually faking single
FMA units, but when we get access to hardware
with a single AVX-512 FMA unit we need to check
that we produce the correct result.

Change-Id: I6240e864bc77f95085c5cd3303a84ab581eb3662

commit | commitdiff | tree

Mark Abraham [Thu, 2 Nov 2017 18:51:16 +0000 (19:51 +0100)]

Add PME integration test

As support for executing PME on GPUs is integrated, this test will
make a simple way to ensure things work as expected regardless of
what build type and hardware is present.

Change-Id: I2ab95b6f84eef9fe18b6858a8886221706706a14

commit | commitdiff | tree

Mark Abraham [Wed, 23 Aug 2017 14:06:44 +0000 (16:06 +0200)]

Extend task assignment code

Existing behaviour is largely unchanged, apart from some details of
how conditions that prevent task assignment are handled, and when.

However it is not feasible in the longer term to continue to implement
a way for gmx mdrun -gpu_id to imply the thread-MPI rank split, so
that is disabled now, along with a useful error message. Instead, for
both real and thread MPI, -gpu_id now limits the available GPU IDs
(issuing an error if there are any duplicates), somewhat like
CUDA_VISIBLE_DEVICES. The new mdrun -gputasks option specifies a full
GPU task assignment, and must be accompanied by a choice of ranks and
what kind of device recevies tasks of each type. Documentation is
updated accordingly.

Aspects of the implementation anticipate the extension to support
long-ranged PME interactions on GPUs, and others in future, so that
the task assignment on a node now takes the form of a container of
tasks, potentially of different types, on each rank of the node. A
flat vector of ints is no longer sufficient.

Errors e.g. from inconsistent user input are now handled with
exceptions, so that the runner can take the responsibility of
reporting those correctly, rather than always aborting the program at
the point where the issue is detected.

gmx tune_pme now explicitly only supports the new form of -gpu_id,
though it would not be difficult to support -gputasks if there
was need.

Change-Id: I0c149913bd43418d374171f5f95dad7f25d3cfe4

commit | commitdiff | tree

Aleksei Iupinov [Fri, 24 Nov 2017 10:55:16 +0000 (11:55 +0100)]

Mention gmx_install_headers() in the developer manual

Change-Id: I703e63b7e8978f23963107044396331cbfe94d7e

commit | commitdiff | tree

Aleksei Iupinov [Tue, 21 Nov 2017 17:20:40 +0000 (18:20 +0100)]

Make FFT real-space grid use CUDA pinning for mixed PME mode

Change-Id: Ic0decb839285cfffd0de6da667e99a08b201183a

commit | commitdiff | tree

Berk Hess [Thu, 23 Nov 2017 15:07:34 +0000 (16:07 +0100)]

Limit SMT with PME on GPU

For small numbers of atoms per core, SMT can seriously deteriorate
performance when running both non-bondeds and PME on GPU.
With fewer than 10000 atoms per core, SMT is now always off by default
with PME on GPU and auto settings.

Change-Id: I1a6b83bc81f68e89bf443e2b0ddb1fde44e2361d

commit | commitdiff | tree

Aleksei Iupinov [Mon, 20 Nov 2017 17:14:30 +0000 (18:14 +0100)]

Pin coordinates, forces and charges for PME on GPU

This pins almost all the required buffers for both PME-only and PP+PME
ranks. The only non-pinned buffer is the FFT grid for not yet enabled
mixed mode. ChargesA in md_atoms now check for PME actually being
used on the rank to avoid the unpin assertion.

Change-Id: Ia4c1b7673cc5c32c8d3d3fcf252d9d923b94128d

commit | commitdiff | tree

Berk Hess [Fri, 13 Oct 2017 12:15:55 +0000 (14:15 +0200)]

Update note in manual on SD

The comment in the SD section about Berendsen was outdated.
Added a few sentences on equilibration/damping of modes.

Change-Id: I12595e881572526637019879ff0bb8ef872e57d6

commit | commitdiff | tree

Berk Hess [Thu, 23 Nov 2017 13:28:31 +0000 (14:28 +0100)]

Correct PME GPU timing naming

Change-Id: I4a39d246cecbfc7a84608dd9c450d57393f1248d

commit | commitdiff | tree

Szilárd Páll [Thu, 23 Nov 2017 11:51:17 +0000 (12:51 +0100)]

Move GPU FFT/gather launch after GPU nonbondeds

Changing the scheduling order makes sure that the launch overhead of GPU
FFTs do not keep the CPU busy preventing nonbonded launch and its overlap
with PME.

Change-Id: I85cdb97cd7d7be6f5314da3a69d68616d8a93092

commit | commitdiff | tree

Pascal Merz [Tue, 28 Mar 2017 07:27:40 +0000 (01:27 -0600)]

Proof of concept: Physical validation suite

This is a proof of concept for a physical validation suite for GROMACS
(https://redmine.gromacs.org/issues/2070). It is kept separate from the
remaining code checks, for a number of reasons:
* Like the regression tests, it is separate from the code, i.e. it
  calls the complete binary from an external script.
* Unlike the regression tests, we are not comparing to the results from
  earlier version, but to what is expected physically / mathematically.
* Unlike the existing tests, we are not be able to keep these tests in the
  "seconds, not minutes" time frame, rather aiming for "hours, not days".
  They should therefore be ran periodically, but probably not for every
  build.
  Also, given the long run time, it will in many cases be necessary to
  separate running of the systems (e.g. to run it at a specific time, or
  on a different ressource), such that the make script does give the option
  to
  - prepare run files and an execution script,
  - analyze already present simulations,
  - or prepare, run and analyze in one go.
* Since those tests can not be ran at the same frequency as the current
  tests, they are kept strictly opt-in via `-DGMX_PHYSICAL_VALIDATION=ON`
  (see below for more details).
* Different tests need different simulation run-times - convergence tests
  need significantly shorter simulations than statistical ensemble tests.
  The tests could hence be ran at different frequency, with short tests
  being integrated in the Jekyll environment, and longer tests being
  triggered manually only.

USAGE WITH CMAKE / CTEST
* Run `cmake` in your build directory as usually.
* Run make:
    * All previously existing build targets are unchanged, including
      `make check`
    * `make check-phys` builds the main binaries, then runs the physical
       validation tests, expecting the simulation to have already ran.
    * `make check-all` is the combination of `make check` and
      `make check-phys`. It builds the main binaries and the test
      targets, then runs the Google tests, the regression tests and the
      physical validation tests.
    * `make check-phys-prepare` builds the main binaries, then prepares
      the validation tests by creating a subdirectory
      tests/physicalvalidation in the build directory, which contains the
      GROMACS input files as well as a bash file `run_simulations.sh` to
      run the necessary simulations outside of the build environment.
    * `make check-phys-run` prepares and runs the necessary simulations,
      and runs the tests once the simulations are finished.
A typical sequence of commands to run the physical validation tests would
hence either be
> cmake -DGMX_PHYSICAL_VALIATION=ON ..
> make
> make check-prepare-phys
> # run simulations, possibly on external ressource
> make check-phys
or, to run everything locally (WARNING - can take several hours!)
> cmake -DGMX_PHYSICAL_VALIATION=ON ..
> make
> make check-phys-run

* Keeping the current `make check` target unchanged meant to do some
  changes to the CMake files in the `tests/` directory.
* Note that currently `make check-phys-run` needs just under one hour
  to run on my workstation (10-core Xeon processor)

ADDED FILES
All additional files are located in the `tests/physicalvalidation` folder
or on of its subfolders.
* The physical validation is done with a python script (python3, but
  fully backwards compatible to 2.7), which can also be called
  independently of Ctest. The script is called `gmx_physicalvalidation.py`
  and is relatively extensively documented. Try running it with `-h` and
  `--tests` to see the available options.
* The input files for the systems are stored in the `systems/` subfolder,
  containing a .top file containing the topology, a .gro file containing
  the starting structure, and a .mdp file containing the input options.
  The .mdp files are kept as minimal as possible to be sensitive to changes
  in GROMACS standard values.
* The physical validation script and the CMake files get informations about
  the systems via the `system.json` (single precision) and `systems_d.json`
  (double precision) files. These json-formatted files contain the system
  names, the tests to be performed on the systems, as well as additional
  options where needed.
* The actual computation of the tests is performed by a python package
  stored in the `physical_validation` folder. This is simply a snapshot of
  our code-independent physical validation package.

SYSTEMS:
Currently, the script is only running a few systems, checking convergence
of energy conservation in NVE system with decaying timestep, and the
ensembles generated by a few thermostating and barostating algorithms. We
have systems ready covering a broader combination of settings, but kept the
list shorter for this proof-of-concept to keep runtime short.

OPT-IN VIA `-DGMX_PHYSICAL_VALIDATION=ON`
* Physical validation tests are now strictly opt-in via
  `-DGMX_PHYSICAL_VALIDATION=ON`, as discussed in the developer telco.
  * If `-DGMX_PHYSICAL_VALIDATION=OFF` (default value),
    * `make check`, `make test` and a direct call to `ctest` are
      identical to before the inclusion of physical validation, i.e.
      they run google tests and (if available) the regression tests.
    * `make check-phys` does not run any tests, but issues a note that
      noting will be checked since `-DGMX_PHYSICAL_VALIDATION=OFF`.
    * `make check-all` is identical to `make check`, but issues a note
      that physical validation is not available since
      `-DGMX_PHYSICAL_VALIDATION=OFF`.
  * If `-DGMX_PHYSICAL_VALIDATION=ON`,
    * `make check` is identical to before the inclusion of physical
      validation, i.e. it runs google tests and (if available) the
      regression tests.
    * `make test` and a direct call to `ctest` run all available tests,
      i.e. google tests, possible regression tests, physical validation
      tests.
    * `make check-phys` runs only the physical validation tests.
    * `make check-all` runs all available tests like `make test`, with
      the difference that the tests and all dependencies are built
      before running the tests.
  Since `GMX_PHYSICAL_VALIDATION` is `OFF` by default, existing setups
  should see no change in behavior.

Change-Id: I3155da64b7dfb43a42236030dce1399d36cb9655

commit | commitdiff | tree

Viveca Lindahl [Wed, 22 Nov 2017 10:15:50 +0000 (11:15 +0100)]

Fixed typo and adds corrects units in the friction tensor equation.

The friction should have units of energy*time/length^2.
Added unit of beta makes the units correct.

Change-Id: I9457e4724cefe87258ccc7f663982b349d4d4219

commit | commitdiff | tree

Viveca Lindahl [Tue, 21 Nov 2017 15:58:15 +0000 (16:58 +0100)]

Added friction tensor to AWH manual section

This also changes the lambda dimensional index from 'd' to 'mu'
for the whole AWH section.

Also fixes a typo.

Change-Id: I09203d7eabcb23482476c6d16c8dd75f5832a662

commit | commitdiff | tree

Mark Abraham [Thu, 9 Nov 2017 00:23:01 +0000 (17:23 -0700)]

Use much less PaddedRVecVector and more ArrayRef of RVec

Only code that handles allocations needs to know the concrete type of
the container. In some cases that do need the container type,
templating on the allocator will be needed in future, so that is
arranged here. This prepares for changing the allocator for state->x
so that we can use one that can be configured at run time for
efficient GPU transfers.

Also introduced PaddedArrayRef to use in code that relies on the
padding and/or alignedness attributes of the PaddedRVecVector. This
keeps partial type safety, although a proper implementation of such a
view should replace the current typedef.

Had to make some associate changes to helper functionality to
use more ArrayRef, rather than rely on the way rvec pointers could
decay to real pointers.

Used some compat::make_unique since that is better style.

Change-Id: I1ed3feb016727665329e919433bece9773b46969

commit | commitdiff | tree

Aleksei Iupinov [Tue, 21 Nov 2017 11:02:24 +0000 (12:02 +0100)]

Fix CUDA release build warning

Change-Id: I840c991f4eef394071626bfdcfcd14192e97a778

commit | commitdiff | tree

Aleksei Iupinov [Tue, 21 Nov 2017 07:19:26 +0000 (08:19 +0100)]

Fix pinned memory status checker tests

Change-Id: I69df298c5c022a21fe7e96d201adfe1d4dfb8563

commit | commitdiff | tree

Erik Lindahl [Sat, 11 Nov 2017 23:04:20 +0000 (16:04 -0700)]

Separate SIMD flag detection from management

This provides separate cmake functions to detect the flags
required to compile each SIMD instruction set. These
functions do not automatically add the SIMD flags to the
default flags, in order to make it possible to detect flags
for many SIMD instruction sets and use different flags for
different files.

Change-Id: I469a0cf0fafe3793d14dbcf0e72fd18f44a3bfe8

commit | commitdiff | tree

Mark Abraham [Sat, 11 Nov 2017 19:50:20 +0000 (12:50 -0700)]

Support pinning in HostAllocator

We want the resize / reserve behaviour to handle page locking that is
useful for efficient GPU transfer, while making it possible to avoid
locking more pages than required for that vector. By embedding the
pin()/unpin() behaviour into malloc() and free() for the allocation
policy, this can be safely handled in all cases.

Additionally, high-level code can now choose for any individual vector
when and whether a pinning policy is required, and even manually
pin and unpin in any special cases that might arise.

When using the policy that does not support pinning, we now use
AlignedAllocator, so that we minimize memory consumption.

Change-Id: I807464222c7cc7718282b1e08204f563869322a0

commit | commitdiff | tree

Viveca Lindahl [Mon, 28 Mar 2016 12:34:35 +0000 (14:34 +0200)]

Add force correlation to AWH module

This change adds the calculation of force correlation statistics
during an AWH biased simulation.
The main quantity of interest is the time-integrated force
correlation, also known as the friction tensor
(see e.g. http://dx.doi.org/10.1103/PhysRevLett.108.190602).
The friction tensor defines a metric on the coordinate space and
the local volume element of this metric is a useful measure for
determining which regions need more or less sampling.
gmx awh prints the friction (tensor) and can also still process
energy files without friction data.

Change-Id: I164be4665004dea5b250e3c7ac135ac1c1cbd783

commit | commitdiff | tree

Paul Bauer [Wed, 11 Oct 2017 14:19:22 +0000 (16:19 +0200)]

First general additions to the documentation

Made a general pass through the documentation and fixed a number
of issues I found with links not being where they should be
or pointing nowhere.

Also gave it a try to parse the log files from the linkchecker
script in the build test.

Change-Id: I29fabd0b824d90f205c13855a85b139f7b789da3

commit | commitdiff | tree

Viveca Lindahl [Mon, 28 Mar 2016 12:31:42 +0000 (14:31 +0200)]

Add reading and writing to AWH module

This change adds IO to the AWH module. AWH writes coordinate
free energies and distributions to an energy file block. The
reading is handled by a new tool gmx awh.

Change-Id: Ie30991bca376c2a648371db771fc5dfd8fca3715

commit | commitdiff | tree

Berk Hess [Mon, 23 Oct 2017 07:39:32 +0000 (09:39 +0200)]

Prevent PME tuning excessive grid scaling

We limit the maximum grid scaling to a factor 1.8. This allows
plenty of room for shifting work from PME on CPU to short-range
GPU kernels, but avoids excessive scaling for diminishing return
in performance for a significant increase in power consumption,
communication volume (which may with fluctuating network load not
show up during tuning) as well as limiting load balancing.

Change-Id: I85c02478faa6b67c063b6e1b45a9ac1755b2d81e

commit | commitdiff | tree

Mark Abraham [Sun, 19 Nov 2017 21:50:03 +0000 (22:50 +0100)]

Fix nightly matrix config

Since we added cmake warning detection, this has been failing, because
the older clang versions do not have openmp support.

Change-Id: Idbc47fd9ce3c5f329fb20ac36da48ea932f6ac03

commit | commitdiff | tree

Berk Hess [Sun, 12 Nov 2017 12:10:54 +0000 (13:10 +0100)]

Add SIMD for AWH

When using the convolved potential with AWH, a large number of double
precision exp() functions need to be evaluated at every step. These
are now SIMD accelerated.

Change-Id: If1e3a916469c4fd7e26740123009ae59b7927667

commit | commitdiff | tree

Viveca Lindahl [Sun, 12 Nov 2017 21:46:40 +0000 (22:46 +0100)]

Add AWH section to manual.

Brief theoretical background and practical aspects of AWH.

Change-Id: Ic794006b00b107b6c9d22a1cfabff4646c87ae4e

commit | commitdiff | tree

Viveca Lindahl [Mon, 28 Mar 2016 12:27:33 +0000 (14:27 +0200)]

Add AWH biasing module + tests

The AWH (Accelerated weight histogram) method is an adaptive biasing
method used for overcoming free energy barriers and calculating
free energies (see http://dx.doi.org/10.1063/1.4890371). Although
AWH can in general bias any system parameter, this change only
implements biasing of reaction coordinates. The actual force
distribution and coordinate handling is taken care of by the pull
code. AWH interacts with the pull code by registering itself as
the external potential module for the coordinate that should be
AWH biased. The AWH code sets the potential and force for those
coordinates.

The Grid test checks that the neighborhoods are correct.
The Bias tests check the force, bias and free energy values
for the final and initial phase, with MC and convolved force
and without and with skipping updates.

Change-Id: I202f58f7042e8e63c9d708fdcaca6da7e8a4022e

commit | commitdiff | tree

Mark Abraham [Tue, 24 Oct 2017 15:40:58 +0000 (17:40 +0200)]

Fix FindCUDA.cmake for -Wundef

CUDA 9.0 issues large numbers of -Wundef warnings from its internal
headers. FindCUDA.cmake should be including such headers as
"system" headers, so it is modified to do that.

Fixes #2276

Change-Id: I36ca1208a0597215bf2f1e38d849786e9ac2fed4

commit | commitdiff | tree

Magnus Lundborg [Tue, 14 Nov 2017 11:06:09 +0000 (12:06 +0100)]

Added Magnus Lundborg as contributor in the manual.

I noticed I was not listed so I shamelessly added myself.

Change-Id: Idf589cabc319dab20abb1a40d73ce14692e71de2

commit | commitdiff | tree

Aleksei Iupinov [Tue, 7 Nov 2017 12:40:17 +0000 (13:40 +0100)]

Asynchronous CUDA transfers are asserted to always use pinned memory

A few tests are also added for the memory status checker function.

Change-Id: Ifd6c9b2a0ad8ad8c325dc553534314e81e5d8bd7

commit | commitdiff | tree

Mark Abraham [Wed, 8 Nov 2017 11:25:41 +0000 (12:25 +0100)]

Use HostAllocationPolicy for chargeA

Change-Id: Id230a0571ff8e4f8f69e347a3aea4e3c44413676

commit | commitdiff | tree

Mark Abraham [Fri, 17 Nov 2017 18:58:39 +0000 (11:58 -0700)]

Add checking function for whether a buffer is pinned

This is useful for several kinds of tests proposed.

Change-Id: If9fdd29e73f16299190b5485f473f6388aab9ec9

commit | commitdiff | tree

Mark Abraham [Mon, 6 Nov 2017 08:28:01 +0000 (09:28 +0100)]

Improve handling of PME GPU force buffer

Managed it with the HostAllocator, and moved the responsibility
for its lifetime to the PME GPU staging structure. The buffer
does not use CUDA pinning yet.

Change-Id: Ia6fdbdb2509137fec1c6cf2a4ac8c04b1696b58f

commit | commitdiff | tree

Aleksei Iupinov [Fri, 10 Nov 2017 15:04:35 +0000 (16:04 +0100)]

Make gpu_utils-test build with GMX_CLANG_CUDA

Same workarounds are applied to libgpu_utilstest as for libgromacs.
Renamed ligbpu_utilstest target to gpu_utilstest_cuda to avoid the
double "lib" prefix in the filename.

Refs #2259, #2293

Change-Id: I16b07a13ce2dca30079a889e2b314483d82d3674

commit | commitdiff | tree

Berk Hess [Tue, 31 Oct 2017 10:03:02 +0000 (11:03 +0100)]

Also print 1x1 pair-list setup to log

mdrun now prints the equivalent 1x1 pair-list setup in addtion
to the NxM list setup. This is to clarify that we can use short
pair list buffers because of our cluster setup.
The list setup is now also printed in case we have a single list.
Removed the note on needing to increase nstlist with a GPU when
we automatically change nstlist.
Changed pick_nbnxn_kernel and nbnxn_atomdata_init to use mdlog
to get correct spacing between paragraphs.

Also cleaned up the verletbuf list setup getter functions.

Change-Id: Ic7b5967b0a62aee9fee9837f60a134fd571ff405

commit | commitdiff | tree

Aleksei Iupinov [Thu, 9 Nov 2017 18:01:43 +0000 (19:01 +0100)]

Rename and expose "generic" GPU memory transfer functions

Dropped the "_generic" suffix from the names. Made the sync/async
argument an enum class instead of boolean.
Made PME use synchronous versions of the functions for unit tests.

Change-Id: I5fd2490d58370d9f0405aea1a74237fa8107cbab

commit | commitdiff | tree

Erik Lindahl [Sun, 12 Nov 2017 13:14:10 +0000 (06:14 -0700)]

Only issue FFT warning messages on changes

Similar to other CMake modules, we should only issue
warnings at the first invocation, or if the FFT library
was changed.

Change-Id: I6dba59f1021984d9a744a55d797814c1c9d89b20

commit | commitdiff | tree

Roland Schulz [Sat, 8 Jul 2017 00:40:48 +0000 (17:40 -0700)]

PME-gather: 4xN SIMD

Speedup on KNL 11% for spread/gather (3% total) on ion-channel

Change-Id: I1a0624408b4e8f7bd441dfe2c260f80d211351d0

commit | commitdiff | tree

Mark Abraham [Tue, 24 Oct 2017 15:40:58 +0000 (17:40 +0200)]

Import cmake Modules/FindCUDA.cmake

CUDA 9.0 issues large numbers of -Wundef warnings from its internal
headers. FindCUDA.cmake should be including such headers as "system"
headers, so to prepare for a patch where it is modified to do that,
this commit imports that file from v3.4.3 of the CMake repository,
because that is a choice likely to work with all future versions of
CMake.

It needs some supporting cmake files that are included unmodified,
so GROMACS does not assert copyright on those. The main FindCUDA.cmake
file is modified only to be able to find those files

Refs #2276

Change-Id: I69ad39dc805648a6cc5e27bb7fcd229f5f2a538a

commit | commitdiff | tree

Roland Schulz [Thu, 9 Nov 2017 22:43:45 +0000 (14:43 -0800)]

Rename load1DualHsimd to loadU1DualHsimd

Documentation didn't require any alignment, test didn't use
alignment and all implementations didn't require any alignment.
But name suggested that alignment is required.
Only current usage had 2-wide alignment but requiring that
would make the function less general without any advantage.

Change-Id: I651c1327a3febc368cb4b039ad226d0771770e60

commit | commitdiff | tree

Roland Schulz [Thu, 9 Nov 2017 23:02:41 +0000 (15:02 -0800)]

AVX: Improve load1DualHsimd

instr+uop: 4->3, throughput/port-pressure(on 5): 3->1
(IACA numbers for IVB-SKL)

Change-Id: Id768cb951dcbace1473448fcd63fa7d40b0e7da6

commit | commitdiff | tree

Roland Schulz [Fri, 10 Nov 2017 02:08:08 +0000 (03:08 +0100)]

Revert "Use -mavx2 -mfma instead of -march with AVX2"

This reverts commit 062a6b81498b61b2bfc4ec7441b844d76aae445b.

Reason for revert: Breaks support for ICC (16-18) which doesn't have -mavx2 or -mfma.

Change-Id: I01cf3e9db332a405fd9419b6382240f5fcecf633

commit | commitdiff | tree

Aleksei Iupinov [Thu, 9 Nov 2017 18:07:30 +0000 (19:07 +0100)]

Rename synchronous GPU transfer functions to match the asynchronous ones

Change-Id: I5cb8e9cab208c1d0c62f985ec3140540ea427fb2

commit | commitdiff | tree

Mark Abraham [Wed, 8 Nov 2017 11:22:57 +0000 (12:22 +0100)]

Prepared t_mdatoms for using vector

Wrapped it in another C++ class because the group-scheme kernels
compile as plain C and this permits the contained t_mdatoms to
be unmodified. The class has responsibility for maintaining the
allocations for any of the fields of t_mdatoms that need to be
managed with a std::vector plus perhaps an allocator.

Change-Id: I6fef70beeb8d43f3e048cec02380f8ebf8153ecb

commit | commitdiff | tree

Mark Abraham [Mon, 6 Nov 2017 07:45:49 +0000 (08:45 +0100)]

Introduce HostAllocationPolicy

This permits host-side standard containers and smart pointers to have
their contents placed in memory suitable for efficient GPU transfer.

The behaviour can be configured at run time during simulation setup,
so that if we are not running on a GPU, then none of the buffers that
might be affected actually are. The downside is that all such
containers now have state.

Change-Id: I9367d0f996de04c21312cef2081cc08148f80561

commit | commitdiff | tree

Roland Schulz [Mon, 30 Oct 2017 18:33:33 +0000 (11:33 -0700)]

ICC should use ZMM if code anyhow uses ZMM

Change-Id: Iaea73df12065b3d4ba1974e48b864f44c9b7fe44

commit | commitdiff | tree

Roland Schulz [Wed, 25 Oct 2017 19:24:42 +0000 (12:24 -0700)]

Fix scalar blend

Change-Id: I580af279cdba494ec13029259e4fd0867a7e5ea2

commit | commitdiff | tree

Magnus Lundborg [Mon, 23 Oct 2017 11:20:00 +0000 (13:20 +0200)]

Update to TNG v 1.8.1

Fixes #2187 and #2250.

Change-Id: Icf81d5f3ce916e984750e1511d32e16ebc45b6f9

commit | commitdiff | tree

Szilárd Páll [Wed, 18 Oct 2017 15:01:54 +0000 (17:01 +0200)]

Use -mavx2 -mfma instead of -march with AVX2

This was (likely) only a workaround for some early gcc version that did
not support correct AVX2 code-generation with just the -mavx2 -mfma
flags. However, just as with other SIMD flavors with AVX2 too we should
not request arch-specific tuning just to get the desired SIMD flavor
enabled.

Change-Id: Ib0c6388bebcffbf0719b438451d3943f51fba4a4

commit | commitdiff | tree

Mark Abraham [Tue, 7 Nov 2017 02:26:32 +0000 (03:26 +0100)]

Reform gmx_pme_pp alloc and use vector

Introduced a helper struct for describing the partner PP ranks.

Reduced some of the conditional compilation.

Updated some naming from node to rank.

Fixed over-use of charge_pp.

Change-Id: I00b59dd116740721ed707af4242c0d44f1615d56

commit | commitdiff | tree

Mark Abraham [Mon, 6 Nov 2017 07:43:24 +0000 (08:43 +0100)]

Introduce gmxopencl.h

This header wraps the different ways to include the main OpenCL header
on different platforms, including suppressions for the warnings about
usage of deprecated API elements. NVIDIA only official supports the
version with the deprecated elements, so we need to continue to use it.

Change-Id: Ie24f20d43272e1747bcbd693815e96cc200d5f50

commit | commitdiff | tree

Szilárd Páll [Fri, 20 Oct 2017 20:26:25 +0000 (22:26 +0200)]

Merge common nbnxn CUDA/OpenCL GPU wait code-paths

The entire GPU wait including timing accumulation as well as staging
data reducion of the nonbonded GPU modules has been unified by
including a single templated version of the code into the common header.
Code has only been moved and changed in minor ways when necessary (e.g.
for the rvec reduction).

Change-Id: Ic9c9690be58a78f92ca99d2af30068e19c19cc6c

commit | commitdiff | tree

Mark Abraham [Tue, 10 Oct 2017 10:09:10 +0000 (10:09 +0000)]

Test clang on ARM in nightly matrix

Also suppress lots of compiler warnings from useless use of
__vectorcall on this target for this compiler.

ARM are targetting clang for future development, so hopefully this
either isn't needed or will work in future. Either way, this change
will continue to do the right thing.

Change-Id: I211952a24aefee8434cc6b32322f359b2a22687b

commit | commitdiff | tree

Szilárd Páll [Mon, 23 Oct 2017 14:11:46 +0000 (16:11 +0200)]

Add wallcycle timer for the PME GPU F reduction

Change-Id: I85185f2acdf3ebdcbac109ef723eb458bc0e9008

commit | commitdiff | tree

Szilárd Páll [Fri, 20 Oct 2017 17:52:13 +0000 (19:52 +0200)]

Split off nbnxn GPU timing and staging reduction

Code reorganization that moves the timing related functions as well as
energy and shift force reduction into separate functions in both CUDA
and OpenCL versions of nbnxn_gpu_wait_for_gpu().

Change-Id: Ic5c9694d9de7f80a772e97f5c9e05bab77a3b82a

commit | commitdiff | tree

Mark Abraham [Tue, 7 Nov 2017 01:52:18 +0000 (02:52 +0100)]

Improve PME includes

Changing an internal ewald-module header for GPU support should not
lead to files outside that module needing to be recompiled. Moved enum
declarations for use outside the module to the header file that
declares such things. Restored necessary includes that were being
satisfied transitively from the internal header, that were prematurely
removed in fae8902688dc48be56e.

Change-Id: I18c3146e80aba9ad0a2c485f2355bc214cbb083c

commit | commitdiff | tree

Szilárd Páll [Fri, 20 Oct 2017 18:55:45 +0000 (20:55 +0200)]

Deduplicate CUDA and OpenCL timer struct

The struct is identical in both CUDA/OpenCL so it's better placed in a
common header, but this needs to be an internal-only header as it pulls
in CUDA dependencies.

Change-Id: I907d68b7c298f2ba0e7a1af2baf4819f637e2f2e

commit | commitdiff | tree

David van der Spoel [Thu, 12 Oct 2017 07:06:44 +0000 (09:06 +0200)]

Fixed check for water in gen_vsite.cpp

Pdb2gmx would break when generating virtual sites if water oxygens
were not named OW. Now checking for the atomnumber instead.

Fixes #2268

Change-Id: I326f683e4940ad02351dcbe0c00e266a82b203f6

commit | commitdiff | tree

Mark Abraham [Fri, 3 Nov 2017 01:05:42 +0000 (02:05 +0100)]

Merge "Merge branch release-2016"

commit | commitdiff | tree

Berk Hess [Wed, 1 Nov 2017 16:21:48 +0000 (17:21 +0100)]

Fix Ekin at step 0 with COM removal

The kinetic energy at step 0 was computed from the velocities without
the center of mass velocity removed. This could cause a relatively
large jump in kinetic energy, especially for small systems.
Now compute_globals is called twice with COM removal so we get
the correct kinetic energy.

Appropriate mdrun tests for energy-conserving integrators are also added.

Change-Id: I87ab08d21a35621735ab3c65fc50af9992120be3

commit | commitdiff | tree

David van der Spoel [Tue, 31 Oct 2017 12:25:56 +0000 (13:25 +0100)]

New mdp input for electric fields.

New format for MDP input for electric fields that is consistent
with the manual and that is comprehensible.

Change-Id: I5f9f434080f5217d2473c16377aee962692b9ee9

Local GROMACS mirror with custom stuff

RSS Atom