BioD PNPI Git Repos - alexxy/gromacs.git/log

biod.pnpi.spb.ru / alexxy / gromacs.git / log

Berk Hess [Fri, 24 Nov 2017 15:09:22 +0000 (16:09 +0100)]

Disabled CUDA timings

CUDA timings are incorrect with multiple streams and currently
we can not query for other streams in the non-bonded or PME module,
so disabled them by default. Added a GMX_ENABLE_GPU_TIMING env.var.
Removed deprecated env.vars.

Change-Id: I55ab98d7fea8fa90782e8346ad73b2d2a2171a1d

commit | commitdiff | tree

Szilárd Páll [Wed, 22 Nov 2017 02:09:00 +0000 (03:09 +0100)]

Add GPU sharing post-submit config

Also added some TODOs to the top of the list with the thinking that
these would be beneficial to test code that we're changing now (e.g.
memory management code).

Change-Id: Ic8ed571f571d6755fecb753d04a73f25f507ad22

commit | commitdiff | tree

Mark Abraham [Fri, 20 Nov 2015 19:43:58 +0000 (20:43 +0100)]

Update developer guide

Imported and updated more material from the wiki. Included coverage of
some recent discussion points on C++11 and preprocessor use.

Change-Id: I72dc1e656f329fe7de9feadc77de84bc39b6eb29

commit | commitdiff | tree

Mark Abraham [Fri, 3 Nov 2017 02:20:21 +0000 (03:20 +0100)]

Activate PME on GPUs

Adds support for a run with a single rank, running NB and PME on the
same GPU.

Fixed bug in thread-MPI gather and gatherv functions. Real MPI does
not require that the send buffer is non-NULL if the send count is
zero, and the thread-MPI docs have not documented that this is an
intended extra requirement. This would be a problem only if e.g.
memcpy(dest,src,0) itself dereferenced src.

TODO user docs

Change-Id: Iace9e720c0958eaacaaa81307f1fe7324fa4c9ac

commit | commitdiff | tree

Aleksei Iupinov [Fri, 24 Nov 2017 14:20:50 +0000 (15:20 +0100)]

Remove outdated (already implemented) pinning TODO

Change-Id: I122b93497aa168aeb354bd63059398c1738fd110

commit | commitdiff | tree

Berk Hess [Tue, 7 Nov 2017 11:23:09 +0000 (12:23 +0100)]

Speed up nbnxn buffer clearing

Change-Id: If1b58589aee4022cff9e6f120526b4e844967501

commit | commitdiff | tree

Mark Abraham [Fri, 3 Nov 2017 01:33:01 +0000 (02:33 +0100)]

Separate management of GPU contexts from modules

Tasks from modules might share GPU contexts across either tasks, or
thread-MPI ranks, so init and free operations can't be the
responsibility of the modules themselves.

Simplified the error reporting for init and free. Knowledge of the
rank ID might help in diagnosing issues in some cases, but should
(later) be the responsibility of a proper framework to catch errors
during initialization across MPI ranks.

Moved the GPU profiling cleanup back to where it was intended to be,
before some earlier refactoring had left it somewhere not-quite-right.

Change-Id: I682a1b1c7058cbebb41805dba05e688cbee18c2a

commit | commitdiff | tree

Erik Lindahl [Mon, 13 Nov 2017 17:48:05 +0000 (10:48 -0700)]

Detect AVX-512 FMA units to choose best SIMD

Add a test program that times AVX-512 code to
detect single vs. dual AVX-512 FMA units.
Added CMake code to always compile this file
with AVX-512 flags, both at CMake configuration
and runtime.
Tested to work on both AVX2 and AVX-512 hardware
with dual FMAs, and by manually faking single
FMA units, but when we get access to hardware
with a single AVX-512 FMA unit we need to check
that we produce the correct result.

Change-Id: I6240e864bc77f95085c5cd3303a84ab581eb3662

commit | commitdiff | tree

Mark Abraham [Thu, 2 Nov 2017 18:51:16 +0000 (19:51 +0100)]

Add PME integration test

As support for executing PME on GPUs is integrated, this test will
make a simple way to ensure things work as expected regardless of
what build type and hardware is present.

Change-Id: I2ab95b6f84eef9fe18b6858a8886221706706a14

commit | commitdiff | tree

Mark Abraham [Wed, 23 Aug 2017 14:06:44 +0000 (16:06 +0200)]

Extend task assignment code

Existing behaviour is largely unchanged, apart from some details of
how conditions that prevent task assignment are handled, and when.

However it is not feasible in the longer term to continue to implement
a way for gmx mdrun -gpu_id to imply the thread-MPI rank split, so
that is disabled now, along with a useful error message. Instead, for
both real and thread MPI, -gpu_id now limits the available GPU IDs
(issuing an error if there are any duplicates), somewhat like
CUDA_VISIBLE_DEVICES. The new mdrun -gputasks option specifies a full
GPU task assignment, and must be accompanied by a choice of ranks and
what kind of device recevies tasks of each type. Documentation is
updated accordingly.

Aspects of the implementation anticipate the extension to support
long-ranged PME interactions on GPUs, and others in future, so that
the task assignment on a node now takes the form of a container of
tasks, potentially of different types, on each rank of the node. A
flat vector of ints is no longer sufficient.

Errors e.g. from inconsistent user input are now handled with
exceptions, so that the runner can take the responsibility of
reporting those correctly, rather than always aborting the program at
the point where the issue is detected.

gmx tune_pme now explicitly only supports the new form of -gpu_id,
though it would not be difficult to support -gputasks if there
was need.

Change-Id: I0c149913bd43418d374171f5f95dad7f25d3cfe4

commit | commitdiff | tree

Aleksei Iupinov [Fri, 24 Nov 2017 10:55:16 +0000 (11:55 +0100)]

Mention gmx_install_headers() in the developer manual

Change-Id: I703e63b7e8978f23963107044396331cbfe94d7e

commit | commitdiff | tree

Aleksei Iupinov [Tue, 21 Nov 2017 17:20:40 +0000 (18:20 +0100)]

Make FFT real-space grid use CUDA pinning for mixed PME mode

Change-Id: Ic0decb839285cfffd0de6da667e99a08b201183a

commit | commitdiff | tree

Berk Hess [Thu, 23 Nov 2017 15:07:34 +0000 (16:07 +0100)]

Limit SMT with PME on GPU

For small numbers of atoms per core, SMT can seriously deteriorate
performance when running both non-bondeds and PME on GPU.
With fewer than 10000 atoms per core, SMT is now always off by default
with PME on GPU and auto settings.

Change-Id: I1a6b83bc81f68e89bf443e2b0ddb1fde44e2361d

commit | commitdiff | tree

Aleksei Iupinov [Mon, 20 Nov 2017 17:14:30 +0000 (18:14 +0100)]

Pin coordinates, forces and charges for PME on GPU

This pins almost all the required buffers for both PME-only and PP+PME
ranks. The only non-pinned buffer is the FFT grid for not yet enabled
mixed mode. ChargesA in md_atoms now check for PME actually being
used on the rank to avoid the unpin assertion.

Change-Id: Ia4c1b7673cc5c32c8d3d3fcf252d9d923b94128d

commit | commitdiff | tree

Berk Hess [Fri, 13 Oct 2017 12:15:55 +0000 (14:15 +0200)]

Update note in manual on SD

The comment in the SD section about Berendsen was outdated.
Added a few sentences on equilibration/damping of modes.

Change-Id: I12595e881572526637019879ff0bb8ef872e57d6

commit | commitdiff | tree

Berk Hess [Thu, 23 Nov 2017 13:28:31 +0000 (14:28 +0100)]

Correct PME GPU timing naming

Change-Id: I4a39d246cecbfc7a84608dd9c450d57393f1248d

commit | commitdiff | tree

Szilárd Páll [Thu, 23 Nov 2017 11:51:17 +0000 (12:51 +0100)]

Move GPU FFT/gather launch after GPU nonbondeds

Changing the scheduling order makes sure that the launch overhead of GPU
FFTs do not keep the CPU busy preventing nonbonded launch and its overlap
with PME.

Change-Id: I85cdb97cd7d7be6f5314da3a69d68616d8a93092

commit | commitdiff | tree

Pascal Merz [Tue, 28 Mar 2017 07:27:40 +0000 (01:27 -0600)]

Proof of concept: Physical validation suite

This is a proof of concept for a physical validation suite for GROMACS
(https://redmine.gromacs.org/issues/2070). It is kept separate from the
remaining code checks, for a number of reasons:
* Like the regression tests, it is separate from the code, i.e. it
  calls the complete binary from an external script.
* Unlike the regression tests, we are not comparing to the results from
  earlier version, but to what is expected physically / mathematically.
* Unlike the existing tests, we are not be able to keep these tests in the
  "seconds, not minutes" time frame, rather aiming for "hours, not days".
  They should therefore be ran periodically, but probably not for every
  build.
  Also, given the long run time, it will in many cases be necessary to
  separate running of the systems (e.g. to run it at a specific time, or
  on a different ressource), such that the make script does give the option
  to
  - prepare run files and an execution script,
  - analyze already present simulations,
  - or prepare, run and analyze in one go.
* Since those tests can not be ran at the same frequency as the current
  tests, they are kept strictly opt-in via `-DGMX_PHYSICAL_VALIDATION=ON`
  (see below for more details).
* Different tests need different simulation run-times - convergence tests
  need significantly shorter simulations than statistical ensemble tests.
  The tests could hence be ran at different frequency, with short tests
  being integrated in the Jekyll environment, and longer tests being
  triggered manually only.

USAGE WITH CMAKE / CTEST
* Run `cmake` in your build directory as usually.
* Run make:
    * All previously existing build targets are unchanged, including
      `make check`
    * `make check-phys` builds the main binaries, then runs the physical
       validation tests, expecting the simulation to have already ran.
    * `make check-all` is the combination of `make check` and
      `make check-phys`. It builds the main binaries and the test
      targets, then runs the Google tests, the regression tests and the
      physical validation tests.
    * `make check-phys-prepare` builds the main binaries, then prepares
      the validation tests by creating a subdirectory
      tests/physicalvalidation in the build directory, which contains the
      GROMACS input files as well as a bash file `run_simulations.sh` to
      run the necessary simulations outside of the build environment.
    * `make check-phys-run` prepares and runs the necessary simulations,
      and runs the tests once the simulations are finished.
A typical sequence of commands to run the physical validation tests would
hence either be
> cmake -DGMX_PHYSICAL_VALIATION=ON ..
> make
> make check-prepare-phys
> # run simulations, possibly on external ressource
> make check-phys
or, to run everything locally (WARNING - can take several hours!)
> cmake -DGMX_PHYSICAL_VALIATION=ON ..
> make
> make check-phys-run

* Keeping the current `make check` target unchanged meant to do some
  changes to the CMake files in the `tests/` directory.
* Note that currently `make check-phys-run` needs just under one hour
  to run on my workstation (10-core Xeon processor)

ADDED FILES
All additional files are located in the `tests/physicalvalidation` folder
or on of its subfolders.
* The physical validation is done with a python script (python3, but
  fully backwards compatible to 2.7), which can also be called
  independently of Ctest. The script is called `gmx_physicalvalidation.py`
  and is relatively extensively documented. Try running it with `-h` and
  `--tests` to see the available options.
* The input files for the systems are stored in the `systems/` subfolder,
  containing a .top file containing the topology, a .gro file containing
  the starting structure, and a .mdp file containing the input options.
  The .mdp files are kept as minimal as possible to be sensitive to changes
  in GROMACS standard values.
* The physical validation script and the CMake files get informations about
  the systems via the `system.json` (single precision) and `systems_d.json`
  (double precision) files. These json-formatted files contain the system
  names, the tests to be performed on the systems, as well as additional
  options where needed.
* The actual computation of the tests is performed by a python package
  stored in the `physical_validation` folder. This is simply a snapshot of
  our code-independent physical validation package.

SYSTEMS:
Currently, the script is only running a few systems, checking convergence
of energy conservation in NVE system with decaying timestep, and the
ensembles generated by a few thermostating and barostating algorithms. We
have systems ready covering a broader combination of settings, but kept the
list shorter for this proof-of-concept to keep runtime short.

OPT-IN VIA `-DGMX_PHYSICAL_VALIDATION=ON`
* Physical validation tests are now strictly opt-in via
  `-DGMX_PHYSICAL_VALIDATION=ON`, as discussed in the developer telco.
  * If `-DGMX_PHYSICAL_VALIDATION=OFF` (default value),
    * `make check`, `make test` and a direct call to `ctest` are
      identical to before the inclusion of physical validation, i.e.
      they run google tests and (if available) the regression tests.
    * `make check-phys` does not run any tests, but issues a note that
      noting will be checked since `-DGMX_PHYSICAL_VALIDATION=OFF`.
    * `make check-all` is identical to `make check`, but issues a note
      that physical validation is not available since
      `-DGMX_PHYSICAL_VALIDATION=OFF`.
  * If `-DGMX_PHYSICAL_VALIDATION=ON`,
    * `make check` is identical to before the inclusion of physical
      validation, i.e. it runs google tests and (if available) the
      regression tests.
    * `make test` and a direct call to `ctest` run all available tests,
      i.e. google tests, possible regression tests, physical validation
      tests.
    * `make check-phys` runs only the physical validation tests.
    * `make check-all` runs all available tests like `make test`, with
      the difference that the tests and all dependencies are built
      before running the tests.
  Since `GMX_PHYSICAL_VALIDATION` is `OFF` by default, existing setups
  should see no change in behavior.

Change-Id: I3155da64b7dfb43a42236030dce1399d36cb9655

commit | commitdiff | tree

Viveca Lindahl [Wed, 22 Nov 2017 10:15:50 +0000 (11:15 +0100)]

Fixed typo and adds corrects units in the friction tensor equation.

The friction should have units of energy*time/length^2.
Added unit of beta makes the units correct.

Change-Id: I9457e4724cefe87258ccc7f663982b349d4d4219

commit | commitdiff | tree

Viveca Lindahl [Tue, 21 Nov 2017 15:58:15 +0000 (16:58 +0100)]

Added friction tensor to AWH manual section

This also changes the lambda dimensional index from 'd' to 'mu'
for the whole AWH section.

Also fixes a typo.

Change-Id: I09203d7eabcb23482476c6d16c8dd75f5832a662

commit | commitdiff | tree

Mark Abraham [Thu, 9 Nov 2017 00:23:01 +0000 (17:23 -0700)]

Use much less PaddedRVecVector and more ArrayRef of RVec

Only code that handles allocations needs to know the concrete type of
the container. In some cases that do need the container type,
templating on the allocator will be needed in future, so that is
arranged here. This prepares for changing the allocator for state->x
so that we can use one that can be configured at run time for
efficient GPU transfers.

Also introduced PaddedArrayRef to use in code that relies on the
padding and/or alignedness attributes of the PaddedRVecVector. This
keeps partial type safety, although a proper implementation of such a
view should replace the current typedef.

Had to make some associate changes to helper functionality to
use more ArrayRef, rather than rely on the way rvec pointers could
decay to real pointers.

Used some compat::make_unique since that is better style.

Change-Id: I1ed3feb016727665329e919433bece9773b46969

commit | commitdiff | tree

Aleksei Iupinov [Tue, 21 Nov 2017 11:02:24 +0000 (12:02 +0100)]

Fix CUDA release build warning

Change-Id: I840c991f4eef394071626bfdcfcd14192e97a778

commit | commitdiff | tree

Aleksei Iupinov [Tue, 21 Nov 2017 07:19:26 +0000 (08:19 +0100)]

Fix pinned memory status checker tests

Change-Id: I69df298c5c022a21fe7e96d201adfe1d4dfb8563

commit | commitdiff | tree

Erik Lindahl [Sat, 11 Nov 2017 23:04:20 +0000 (16:04 -0700)]

Separate SIMD flag detection from management

This provides separate cmake functions to detect the flags
required to compile each SIMD instruction set. These
functions do not automatically add the SIMD flags to the
default flags, in order to make it possible to detect flags
for many SIMD instruction sets and use different flags for
different files.

Change-Id: I469a0cf0fafe3793d14dbcf0e72fd18f44a3bfe8

commit | commitdiff | tree

Mark Abraham [Sat, 11 Nov 2017 19:50:20 +0000 (12:50 -0700)]

Support pinning in HostAllocator

We want the resize / reserve behaviour to handle page locking that is
useful for efficient GPU transfer, while making it possible to avoid
locking more pages than required for that vector. By embedding the
pin()/unpin() behaviour into malloc() and free() for the allocation
policy, this can be safely handled in all cases.

Additionally, high-level code can now choose for any individual vector
when and whether a pinning policy is required, and even manually
pin and unpin in any special cases that might arise.

When using the policy that does not support pinning, we now use
AlignedAllocator, so that we minimize memory consumption.

Change-Id: I807464222c7cc7718282b1e08204f563869322a0

commit | commitdiff | tree

Viveca Lindahl [Mon, 28 Mar 2016 12:34:35 +0000 (14:34 +0200)]

Add force correlation to AWH module

This change adds the calculation of force correlation statistics
during an AWH biased simulation.
The main quantity of interest is the time-integrated force
correlation, also known as the friction tensor
(see e.g. http://dx.doi.org/10.1103/PhysRevLett.108.190602).
The friction tensor defines a metric on the coordinate space and
the local volume element of this metric is a useful measure for
determining which regions need more or less sampling.
gmx awh prints the friction (tensor) and can also still process
energy files without friction data.

Change-Id: I164be4665004dea5b250e3c7ac135ac1c1cbd783

commit | commitdiff | tree

Paul Bauer [Wed, 11 Oct 2017 14:19:22 +0000 (16:19 +0200)]

First general additions to the documentation

Made a general pass through the documentation and fixed a number
of issues I found with links not being where they should be
or pointing nowhere.

Also gave it a try to parse the log files from the linkchecker
script in the build test.

Change-Id: I29fabd0b824d90f205c13855a85b139f7b789da3

commit | commitdiff | tree

Viveca Lindahl [Mon, 28 Mar 2016 12:31:42 +0000 (14:31 +0200)]

Add reading and writing to AWH module

This change adds IO to the AWH module. AWH writes coordinate
free energies and distributions to an energy file block. The
reading is handled by a new tool gmx awh.

Change-Id: Ie30991bca376c2a648371db771fc5dfd8fca3715

commit | commitdiff | tree

Berk Hess [Mon, 23 Oct 2017 07:39:32 +0000 (09:39 +0200)]

Prevent PME tuning excessive grid scaling

We limit the maximum grid scaling to a factor 1.8. This allows
plenty of room for shifting work from PME on CPU to short-range
GPU kernels, but avoids excessive scaling for diminishing return
in performance for a significant increase in power consumption,
communication volume (which may with fluctuating network load not
show up during tuning) as well as limiting load balancing.

Change-Id: I85c02478faa6b67c063b6e1b45a9ac1755b2d81e

commit | commitdiff | tree

Mark Abraham [Sun, 19 Nov 2017 21:50:03 +0000 (22:50 +0100)]

Fix nightly matrix config

Since we added cmake warning detection, this has been failing, because
the older clang versions do not have openmp support.

Change-Id: Idbc47fd9ce3c5f329fb20ac36da48ea932f6ac03

commit | commitdiff | tree

Berk Hess [Sun, 12 Nov 2017 12:10:54 +0000 (13:10 +0100)]

Add SIMD for AWH

When using the convolved potential with AWH, a large number of double
precision exp() functions need to be evaluated at every step. These
are now SIMD accelerated.

Change-Id: If1e3a916469c4fd7e26740123009ae59b7927667

commit | commitdiff | tree

Viveca Lindahl [Sun, 12 Nov 2017 21:46:40 +0000 (22:46 +0100)]

Add AWH section to manual.

Brief theoretical background and practical aspects of AWH.

Change-Id: Ic794006b00b107b6c9d22a1cfabff4646c87ae4e

commit | commitdiff | tree

Viveca Lindahl [Mon, 28 Mar 2016 12:27:33 +0000 (14:27 +0200)]

Add AWH biasing module + tests

The AWH (Accelerated weight histogram) method is an adaptive biasing
method used for overcoming free energy barriers and calculating
free energies (see http://dx.doi.org/10.1063/1.4890371). Although
AWH can in general bias any system parameter, this change only
implements biasing of reaction coordinates. The actual force
distribution and coordinate handling is taken care of by the pull
code. AWH interacts with the pull code by registering itself as
the external potential module for the coordinate that should be
AWH biased. The AWH code sets the potential and force for those
coordinates.

The Grid test checks that the neighborhoods are correct.
The Bias tests check the force, bias and free energy values
for the final and initial phase, with MC and convolved force
and without and with skipping updates.

Change-Id: I202f58f7042e8e63c9d708fdcaca6da7e8a4022e

commit | commitdiff | tree

Mark Abraham [Tue, 24 Oct 2017 15:40:58 +0000 (17:40 +0200)]

Fix FindCUDA.cmake for -Wundef

CUDA 9.0 issues large numbers of -Wundef warnings from its internal
headers. FindCUDA.cmake should be including such headers as
"system" headers, so it is modified to do that.

Fixes #2276

Change-Id: I36ca1208a0597215bf2f1e38d849786e9ac2fed4

commit | commitdiff | tree

Magnus Lundborg [Tue, 14 Nov 2017 11:06:09 +0000 (12:06 +0100)]

Added Magnus Lundborg as contributor in the manual.

I noticed I was not listed so I shamelessly added myself.

Change-Id: Idf589cabc319dab20abb1a40d73ce14692e71de2

commit | commitdiff | tree

Aleksei Iupinov [Tue, 7 Nov 2017 12:40:17 +0000 (13:40 +0100)]

Asynchronous CUDA transfers are asserted to always use pinned memory

A few tests are also added for the memory status checker function.

Change-Id: Ifd6c9b2a0ad8ad8c325dc553534314e81e5d8bd7

commit | commitdiff | tree

Mark Abraham [Wed, 8 Nov 2017 11:25:41 +0000 (12:25 +0100)]

Use HostAllocationPolicy for chargeA

Change-Id: Id230a0571ff8e4f8f69e347a3aea4e3c44413676

commit | commitdiff | tree

Mark Abraham [Fri, 17 Nov 2017 18:58:39 +0000 (11:58 -0700)]

Add checking function for whether a buffer is pinned

This is useful for several kinds of tests proposed.

Change-Id: If9fdd29e73f16299190b5485f473f6388aab9ec9

commit | commitdiff | tree

Mark Abraham [Mon, 6 Nov 2017 08:28:01 +0000 (09:28 +0100)]

Improve handling of PME GPU force buffer

Managed it with the HostAllocator, and moved the responsibility
for its lifetime to the PME GPU staging structure. The buffer
does not use CUDA pinning yet.

Change-Id: Ia6fdbdb2509137fec1c6cf2a4ac8c04b1696b58f

commit | commitdiff | tree

Aleksei Iupinov [Fri, 10 Nov 2017 15:04:35 +0000 (16:04 +0100)]

Make gpu_utils-test build with GMX_CLANG_CUDA

Same workarounds are applied to libgpu_utilstest as for libgromacs.
Renamed ligbpu_utilstest target to gpu_utilstest_cuda to avoid the
double "lib" prefix in the filename.

Refs #2259, #2293

Change-Id: I16b07a13ce2dca30079a889e2b314483d82d3674

commit | commitdiff | tree

Berk Hess [Tue, 31 Oct 2017 10:03:02 +0000 (11:03 +0100)]

Also print 1x1 pair-list setup to log

mdrun now prints the equivalent 1x1 pair-list setup in addtion
to the NxM list setup. This is to clarify that we can use short
pair list buffers because of our cluster setup.
The list setup is now also printed in case we have a single list.
Removed the note on needing to increase nstlist with a GPU when
we automatically change nstlist.
Changed pick_nbnxn_kernel and nbnxn_atomdata_init to use mdlog
to get correct spacing between paragraphs.

Also cleaned up the verletbuf list setup getter functions.

Change-Id: Ic7b5967b0a62aee9fee9837f60a134fd571ff405

commit | commitdiff | tree

Aleksei Iupinov [Thu, 9 Nov 2017 18:01:43 +0000 (19:01 +0100)]

Rename and expose "generic" GPU memory transfer functions

Dropped the "_generic" suffix from the names. Made the sync/async
argument an enum class instead of boolean.
Made PME use synchronous versions of the functions for unit tests.

Change-Id: I5fd2490d58370d9f0405aea1a74237fa8107cbab

commit | commitdiff | tree

Erik Lindahl [Sun, 12 Nov 2017 13:14:10 +0000 (06:14 -0700)]

Only issue FFT warning messages on changes

Similar to other CMake modules, we should only issue
warnings at the first invocation, or if the FFT library
was changed.

Change-Id: I6dba59f1021984d9a744a55d797814c1c9d89b20

commit | commitdiff | tree

Roland Schulz [Sat, 8 Jul 2017 00:40:48 +0000 (17:40 -0700)]

PME-gather: 4xN SIMD

Speedup on KNL 11% for spread/gather (3% total) on ion-channel

Change-Id: I1a0624408b4e8f7bd441dfe2c260f80d211351d0

commit | commitdiff | tree

Mark Abraham [Tue, 24 Oct 2017 15:40:58 +0000 (17:40 +0200)]

Import cmake Modules/FindCUDA.cmake

CUDA 9.0 issues large numbers of -Wundef warnings from its internal
headers. FindCUDA.cmake should be including such headers as "system"
headers, so to prepare for a patch where it is modified to do that,
this commit imports that file from v3.4.3 of the CMake repository,
because that is a choice likely to work with all future versions of
CMake.

It needs some supporting cmake files that are included unmodified,
so GROMACS does not assert copyright on those. The main FindCUDA.cmake
file is modified only to be able to find those files

Refs #2276

Change-Id: I69ad39dc805648a6cc5e27bb7fcd229f5f2a538a

commit | commitdiff | tree

Roland Schulz [Thu, 9 Nov 2017 22:43:45 +0000 (14:43 -0800)]

Rename load1DualHsimd to loadU1DualHsimd

Documentation didn't require any alignment, test didn't use
alignment and all implementations didn't require any alignment.
But name suggested that alignment is required.
Only current usage had 2-wide alignment but requiring that
would make the function less general without any advantage.

Change-Id: I651c1327a3febc368cb4b039ad226d0771770e60

commit | commitdiff | tree

Roland Schulz [Thu, 9 Nov 2017 23:02:41 +0000 (15:02 -0800)]

AVX: Improve load1DualHsimd

instr+uop: 4->3, throughput/port-pressure(on 5): 3->1
(IACA numbers for IVB-SKL)

Change-Id: Id768cb951dcbace1473448fcd63fa7d40b0e7da6

commit | commitdiff | tree

Roland Schulz [Fri, 10 Nov 2017 02:08:08 +0000 (03:08 +0100)]

Revert "Use -mavx2 -mfma instead of -march with AVX2"

This reverts commit 062a6b81498b61b2bfc4ec7441b844d76aae445b.

Reason for revert: Breaks support for ICC (16-18) which doesn't have -mavx2 or -mfma.

Change-Id: I01cf3e9db332a405fd9419b6382240f5fcecf633

commit | commitdiff | tree

Aleksei Iupinov [Thu, 9 Nov 2017 18:07:30 +0000 (19:07 +0100)]

Rename synchronous GPU transfer functions to match the asynchronous ones

Change-Id: I5cb8e9cab208c1d0c62f985ec3140540ea427fb2

commit | commitdiff | tree

Mark Abraham [Wed, 8 Nov 2017 11:22:57 +0000 (12:22 +0100)]

Prepared t_mdatoms for using vector

Wrapped it in another C++ class because the group-scheme kernels
compile as plain C and this permits the contained t_mdatoms to
be unmodified. The class has responsibility for maintaining the
allocations for any of the fields of t_mdatoms that need to be
managed with a std::vector plus perhaps an allocator.

Change-Id: I6fef70beeb8d43f3e048cec02380f8ebf8153ecb

commit | commitdiff | tree

Mark Abraham [Mon, 6 Nov 2017 07:45:49 +0000 (08:45 +0100)]

Introduce HostAllocationPolicy

This permits host-side standard containers and smart pointers to have
their contents placed in memory suitable for efficient GPU transfer.

The behaviour can be configured at run time during simulation setup,
so that if we are not running on a GPU, then none of the buffers that
might be affected actually are. The downside is that all such
containers now have state.

Change-Id: I9367d0f996de04c21312cef2081cc08148f80561

commit | commitdiff | tree

Roland Schulz [Mon, 30 Oct 2017 18:33:33 +0000 (11:33 -0700)]

ICC should use ZMM if code anyhow uses ZMM

Change-Id: Iaea73df12065b3d4ba1974e48b864f44c9b7fe44

commit | commitdiff | tree

Roland Schulz [Wed, 25 Oct 2017 19:24:42 +0000 (12:24 -0700)]

Fix scalar blend

Change-Id: I580af279cdba494ec13029259e4fd0867a7e5ea2

commit | commitdiff | tree

Magnus Lundborg [Mon, 23 Oct 2017 11:20:00 +0000 (13:20 +0200)]

Update to TNG v 1.8.1

Fixes #2187 and #2250.

Change-Id: Icf81d5f3ce916e984750e1511d32e16ebc45b6f9

commit | commitdiff | tree

Szilárd Páll [Wed, 18 Oct 2017 15:01:54 +0000 (17:01 +0200)]

Use -mavx2 -mfma instead of -march with AVX2

This was (likely) only a workaround for some early gcc version that did
not support correct AVX2 code-generation with just the -mavx2 -mfma
flags. However, just as with other SIMD flavors with AVX2 too we should
not request arch-specific tuning just to get the desired SIMD flavor
enabled.

Change-Id: Ib0c6388bebcffbf0719b438451d3943f51fba4a4

commit | commitdiff | tree

Mark Abraham [Tue, 7 Nov 2017 02:26:32 +0000 (03:26 +0100)]

Reform gmx_pme_pp alloc and use vector

Introduced a helper struct for describing the partner PP ranks.

Reduced some of the conditional compilation.

Updated some naming from node to rank.

Fixed over-use of charge_pp.

Change-Id: I00b59dd116740721ed707af4242c0d44f1615d56

commit | commitdiff | tree

Mark Abraham [Mon, 6 Nov 2017 07:43:24 +0000 (08:43 +0100)]

Introduce gmxopencl.h

This header wraps the different ways to include the main OpenCL header
on different platforms, including suppressions for the warnings about
usage of deprecated API elements. NVIDIA only official supports the
version with the deprecated elements, so we need to continue to use it.

Change-Id: Ie24f20d43272e1747bcbd693815e96cc200d5f50

commit | commitdiff | tree

Szilárd Páll [Fri, 20 Oct 2017 20:26:25 +0000 (22:26 +0200)]

Merge common nbnxn CUDA/OpenCL GPU wait code-paths

The entire GPU wait including timing accumulation as well as staging
data reducion of the nonbonded GPU modules has been unified by
including a single templated version of the code into the common header.
Code has only been moved and changed in minor ways when necessary (e.g.
for the rvec reduction).

Change-Id: Ic9c9690be58a78f92ca99d2af30068e19c19cc6c

commit | commitdiff | tree

Mark Abraham [Tue, 10 Oct 2017 10:09:10 +0000 (10:09 +0000)]

Test clang on ARM in nightly matrix

Also suppress lots of compiler warnings from useless use of
__vectorcall on this target for this compiler.

ARM are targetting clang for future development, so hopefully this
either isn't needed or will work in future. Either way, this change
will continue to do the right thing.

Change-Id: I211952a24aefee8434cc6b32322f359b2a22687b

commit | commitdiff | tree

Szilárd Páll [Mon, 23 Oct 2017 14:11:46 +0000 (16:11 +0200)]

Add wallcycle timer for the PME GPU F reduction

Change-Id: I85185f2acdf3ebdcbac109ef723eb458bc0e9008

commit | commitdiff | tree

Szilárd Páll [Fri, 20 Oct 2017 17:52:13 +0000 (19:52 +0200)]

Split off nbnxn GPU timing and staging reduction

Code reorganization that moves the timing related functions as well as
energy and shift force reduction into separate functions in both CUDA
and OpenCL versions of nbnxn_gpu_wait_for_gpu().

Change-Id: Ic5c9694d9de7f80a772e97f5c9e05bab77a3b82a

commit | commitdiff | tree

Mark Abraham [Tue, 7 Nov 2017 01:52:18 +0000 (02:52 +0100)]

Improve PME includes

Changing an internal ewald-module header for GPU support should not
lead to files outside that module needing to be recompiled. Moved enum
declarations for use outside the module to the header file that
declares such things. Restored necessary includes that were being
satisfied transitively from the internal header, that were prematurely
removed in fae8902688dc48be56e.

Change-Id: I18c3146e80aba9ad0a2c485f2355bc214cbb083c

commit | commitdiff | tree

Szilárd Páll [Fri, 20 Oct 2017 18:55:45 +0000 (20:55 +0200)]

Deduplicate CUDA and OpenCL timer struct

The struct is identical in both CUDA/OpenCL so it's better placed in a
common header, but this needs to be an internal-only header as it pulls
in CUDA dependencies.

Change-Id: I907d68b7c298f2ba0e7a1af2baf4819f637e2f2e

commit | commitdiff | tree

David van der Spoel [Thu, 12 Oct 2017 07:06:44 +0000 (09:06 +0200)]

Fixed check for water in gen_vsite.cpp

Pdb2gmx would break when generating virtual sites if water oxygens
were not named OW. Now checking for the atomnumber instead.

Fixes #2268

Change-Id: I326f683e4940ad02351dcbe0c00e266a82b203f6

commit | commitdiff | tree

Mark Abraham [Fri, 3 Nov 2017 01:05:42 +0000 (02:05 +0100)]

Merge "Merge branch release-2016"

commit | commitdiff | tree

Berk Hess [Wed, 1 Nov 2017 16:21:48 +0000 (17:21 +0100)]

Fix Ekin at step 0 with COM removal

The kinetic energy at step 0 was computed from the velocities without
the center of mass velocity removed. This could cause a relatively
large jump in kinetic energy, especially for small systems.
Now compute_globals is called twice with COM removal so we get
the correct kinetic energy.

Appropriate mdrun tests for energy-conserving integrators are also added.

Change-Id: I87ab08d21a35621735ab3c65fc50af9992120be3

commit | commitdiff | tree

David van der Spoel [Tue, 31 Oct 2017 12:25:56 +0000 (13:25 +0100)]

New mdp input for electric fields.

New format for MDP input for electric fields that is consistent
with the manual and that is comprehensible.

Change-Id: I5f9f434080f5217d2473c16377aee962692b9ee9

commit | commitdiff | tree

Aleksei Iupinov [Tue, 31 Oct 2017 22:51:19 +0000 (23:51 +0100)]

Replace math.h by cmath includes in cpp files

Partially fixes #2285 (for non-GPU build)

Change-Id: I638a0b8ba5e4e04e00730b01640ac7c6a41834ed

commit | commitdiff | tree

Mark Abraham [Thu, 2 Nov 2017 09:43:11 +0000 (10:43 +0100)]

Merge branch release-2016

Ensured fix for gmx compare cmp_atoms went to the right code.

Change-Id: Iabc8ec03e7ebc45517f63697c3e7dea12b3f5398

commit | commitdiff | tree

Berk Hess [Thu, 2 Nov 2017 08:42:39 +0000 (09:42 +0100)]

Add missing Ewald correction for pme-user

With coulomb-type = pme-user, the Ewald mesh energy was not subtracted
leading to (very) incorrect Coulomb energies and forces.

Fixes #2286

Change-Id: Idfef9896d484e254264150e718c5516a832a2ad4

commit | commitdiff | tree

Paul Bauer [Mon, 30 Oct 2017 14:40:16 +0000 (15:40 +0100)]

Small change to LaTeX manual generation

Removed the gmxlite if statements in the pdf manual source files. They
made it more difficult to generate the new markup style files and are
apparently not needed.

Change-Id: Ica401f103c8f9682c7a45bdd90aa8680db7ff56a

commit | commitdiff | tree

Mark Abraham [Mon, 30 Oct 2017 17:13:07 +0000 (18:13 +0100)]

Fix thread-MPI rank choice for orientation restraints

Only a single rank is supported, so that must be what the thread-MPI
code will choose. There's another check later on that catches the
multi-rank MPI case.

Change-Id: I9ccf5fbe958fc0c004a89ebc92a352460e9cba1f

commit | commitdiff | tree

Aleksei Iupinov [Wed, 1 Nov 2017 11:35:53 +0000 (12:35 +0100)]

Remove unused PME GPU declarations

Change-Id: If64bcf73e825f6cd5ba48345f931c9dd25241046

commit | commitdiff | tree

Aleksei Iupinov [Wed, 1 Nov 2017 11:31:52 +0000 (12:31 +0100)]

Move pme_gpu_finish_computation() documentation to the declaration

Change-Id: I4970424eb5108e51c6e8b00b55a60854900e16b9

commit | commitdiff | tree

Paul Bauer [Wed, 1 Nov 2017 11:44:51 +0000 (12:44 +0100)]

Fixing missing references in web documentation

Change-Id: Ifca209c15f4cec3fed24e2070df8fa85320d02dd

commit | commitdiff | tree

Aleksei Iupinov [Tue, 31 Oct 2017 16:15:51 +0000 (17:15 +0100)]

Fix erroneous PME GPU "step" namings

Previous PME GPU code/documentation assumed single PME computation
per MD step, while there can actually be several. This change
replaces erroneous "step" names in the PME GPU module with
"(PME) computation" and similar.

Change-Id: Id230e848e0db0648a429bfc35a59106d1db1f7c9

commit | commitdiff | tree

Mark Abraham [Wed, 25 Oct 2017 10:08:01 +0000 (12:08 +0200)]

Improve handling of GPU IDs

Shifted responsibility for handling parsing of mdrun -gpu_id to early
in the runner, rather than as part of the assignment process.

Moved utility string handling + tests to taskassignment module, since
they only supported this process. Updated string handling in gmx
tune_pme to use more std::string and use the new
functionality. makeGpuIds will be used to replace the code in
assign_rank_gpu_ids in a subsequent patch.

Change-Id: I8d39cc69d0f96ac395858ed7cbe9f2947081b384

commit | commitdiff | tree

Aleksei Iupinov [Fri, 27 Oct 2017 13:37:47 +0000 (15:37 +0200)]

Simplify PME GPU synchronization code

Most synchronization events are removed; synchronization is mostly
done by a single stream synchronization call at the end of the step.

Change-Id: Ia793f2623d81ae8e3f6dfb5c84a6a636e422d982

commit | commitdiff | tree

Aleksei Iupinov [Tue, 31 Oct 2017 15:42:25 +0000 (16:42 +0100)]

Reuse epbcXY logic

Change-Id: I9ec7521b050521932b64b2b08a58c7b530975fb0

commit | commitdiff | tree

Szilárd Páll [Tue, 31 Oct 2017 14:11:38 +0000 (15:11 +0100)]

Fix nstlist increase warning print

The log file warning print had a buggy conditional which this commit
fixes.

Change-Id: Ic106fa3fba54b2c394818e3a642f462d2675a2b1

commit | commitdiff | tree

Szilárd Páll [Mon, 16 Oct 2017 15:40:23 +0000 (17:40 +0200)]

Check CUDA available/compiled code compatibility

Added an early check to detect when the gmx binary does not embed code
compatible with the GPU device it tries to use nor does it have PTX that
could have been JIT-ed.

Additionally, if the user manually sets GMX_CUDA_TARGET_COMPUTE=20 and
no later SM or COMPUTE but runs on >2.0 hardware, we'd be executing
JIT-ed Fermi kernels with incorrect host-side code assumptions
(e.g amount of shared memory allocated or texture type).
This change also prevents such cases.

Fixes #2273

Change-Id: I5472b1a33e584a75f451e21e9fd25992633fbea9

commit | commitdiff | tree

Mark Abraham [Wed, 25 Oct 2017 09:45:32 +0000 (11:45 +0200)]

Update treatment of GPU compatibility data structure

Now we only construct the vector of compatible GPUs once per mdrun,
and are less coupled to hw_info and gpu_info structs.

Change-Id: I181f0486d0ea1670de7a85046c94c1fef83dce17

commit | commitdiff | tree

Szilárd Páll [Tue, 31 Oct 2017 14:18:27 +0000 (15:18 +0100)]

Fix nstlist increase warning print

The log file warning print had a buggy conditional which this commit
fixes.

NOTE: skip when merging, upstream fix submitted separately.

Change-Id: Id85223a3f762bbab26525a60987870d77cd5a01c

commit | commitdiff | tree

David van der Spoel [Mon, 30 Oct 2017 08:03:13 +0000 (09:03 +0100)]

Fixed mdp output from electric field code.

Added two new tests for MDP output.

Fixes #2258

Change-Id: I495454bd2349be836c1a3ef5985288a996abf20e

commit | commitdiff | tree

Aleksei Iupinov [Mon, 30 Oct 2017 10:56:17 +0000 (11:56 +0100)]

Fix reference mode build unused function warnings

Change-Id: Ibd1ad83c5dbeffe86e47156d456d78ab1ab8aeeb

commit | commitdiff | tree

Berk Hess [Sun, 29 Oct 2017 21:20:54 +0000 (22:20 +0100)]

Remove unused sign parameter from dih_angle()

Change-Id: I88a73ca49b6acfc59b4baf0d847aa81542a870ca

commit | commitdiff | tree

Roland Schulz [Fri, 13 Oct 2017 18:49:46 +0000 (11:49 -0700)]

ArrayRef: Replace fromVector with subArray

Creating ArrayRef from iterators is potentially dangerous,
because it is incorrect for non-contiguous containers.

arrayRefFromVector(v.begin()+start, v.begin()+start+length)
is replaced with
ArrayRef<T>(v).subArray(start, length)

Also:
- Combine all conversion constructors
  Removes code duplication and makes conversion more powerful
  (e.g. base pointer or containers with allocators).
- remove fromPointers and arrayRefFromPointers
  Wasn't used by any code
- remove fromArray and replace wih arrayRefFromArray

Change-Id: I05ad6b285ece58739d9f5bce48f9ecf4ade3454e

commit | commitdiff | tree

David van der Spoel [Fri, 13 Oct 2017 16:36:27 +0000 (18:36 +0200)]

Added option -water tips3p to pdb2gmx.

Fixes #2272

Change-Id: Ibfc63009767fd667df51ff10041791268351e1ca

commit | commitdiff | tree

Aleksei Iupinov [Fri, 27 Oct 2017 11:01:19 +0000 (13:01 +0200)]

Bring PME GPU/CUDA internal structure names to CamelCase

This only does mechanical renaming (e.g. pme_gpu_settings_t to
PmeGpuSettings). Any meaningful renames will be done separately.

Change-Id: I7ea2af94fd0212ff6edcf433ff21842c5bbb67b0

commit | commitdiff | tree

Mark Abraham [Tue, 24 Oct 2017 19:59:45 +0000 (21:59 +0200)]

Fix and update hw_info

Stopped using typedef struct (so later we can put a vector into the
struct).

Managed the memory using a unique_ptr, and made the interface reflect
that it is a file static, rather than something that is owned by
e.g. the runner.

Amended docs to clarify the sense of "global."

Change-Id: I1ce9bc42e03668498051b59aaeeb9e50a9f6f762

commit | commitdiff | tree

Aleksei Iupinov [Fri, 27 Oct 2017 13:09:04 +0000 (15:09 +0200)]

Use new/delete for gmx_pme_t

Change-Id: I176b1d26d484514c65cae412c474b65410191d38

commit | commitdiff | tree

Aleksei Iupinov [Thu, 26 Oct 2017 15:04:34 +0000 (17:04 +0200)]

Simplify PME data handling in runner

Differing ownership of the PME data for PME-only and other ranks
is now hidden behind a reference. gmx_pme_init() now returns
a pointer to the allocated structure.

Change-Id: Ia9c5117a0db43a6564298dd621cf9254f0423acf

commit | commitdiff | tree

Aleksei Iupinov [Thu, 26 Oct 2017 14:48:06 +0000 (16:48 +0200)]

Make PME tuning logic more readable

Change-Id: Ie53693a84264ed33c17894aa551cf476a3ced26b

commit | commitdiff | tree

Berk Hess [Sun, 29 Oct 2017 21:12:20 +0000 (22:12 +0100)]

Remove incorrect comment for CHARMM tips3p

Change-Id: I383e28a7b75aa3654a65d15358820a28f9163308

commit | commitdiff | tree

Aleksei Iupinov [Thu, 26 Oct 2017 11:25:23 +0000 (13:25 +0200)]

Remove unused PME grid dump debug functions

Change-Id: Iac748080fdf29e6f35ecf37de2b968e70c72605e

commit | commitdiff | tree

Mark Abraham [Thu, 26 Oct 2017 08:36:13 +0000 (10:36 +0200)]

Fix hw detection more

gmx_hardware_detect was called in response to GoogleTest environment
SetUp function, so the cleanup for its global should occur in response
to the corresponding TearDown function. Both those should be virtual.
Thus the hardwareInfo should not be in a smart pointer called by a
destructor that might be called at a different point from TearDown.

The new getter function and the callback that handles making the first
call to it conform better to GoogleTest's recommendation to arrange to
call AddGlobalTestEnvironment from main() rather than rely on static
initialization.

Made hardwareInit a non-member function because that improves
encapsulation.

Change-Id: I2f8e14ecc1707bf31d023a4eb4fea0a20543910b

commit | commitdiff | tree

Aleksei Iupinov [Thu, 26 Oct 2017 08:36:48 +0000 (10:36 +0200)]

Replace a few asserts with GMX_ASSERT's

Change-Id: I18e614de57fc06f3faabc687140821223bd7c4f4

commit | commitdiff | tree

Aleksei Iupinov [Thu, 26 Oct 2017 11:49:22 +0000 (13:49 +0200)]

Remove defunct PME initialization error code return

The error was never actually returned, and invalid inputs
are already treated with exceptions anyway.

Change-Id: I6063612c3a2e760fb56b7bdf5b1624ab2fc031bd

commit | commitdiff | tree

Mark Abraham [Mon, 9 Oct 2017 11:50:25 +0000 (13:50 +0200)]

Make release matrix work again

Seems we didn't test this matrix when we updated infrastructure some
time.

Change-Id: Ib19672db6144bb40f08d2fcace4d43dbd52e6823

commit | commitdiff | tree

Szilárd Páll [Mon, 16 Oct 2017 18:15:25 +0000 (20:15 +0200)]

Reorganize PME GPU launch

Wrapped the first (prep/spread) and second stage (fft/gather) of PME GPU
in functions. Moved the second stage of the regular PME GPU mode to after
the nonbonded x transform to ensure that the transform can overlap with
spread even when the launch overhead of the FFT kernels is high.

Also removed TPI-related PME-GPU launch conditions as this should be
checked much earlier. Noted in the force flags docs that the current
code assumes GMX_FORCE_STATECHANGED is used only with TPI.

Change-Id: I7f765d66c6c4e7e54812b81b2dd23751af0b06b5

Local GROMACS mirror with custom stuff

RSS Atom