Mark Abraham [Mon, 16 Oct 2017 06:10:34 +0000 (08:10 +0200)]
Merge branch release-2016
Change-Id: Ia56e987f52e4dee425b12b02940ad9ca18d0c13a
Berk Hess [Sun, 24 Sep 2017 20:27:02 +0000 (22:27 +0200)]
Improve vsite parallel checking
The vsite struct now stores internally whether it has been configured
with domain decomposition. This allows for internal checks on valid
commrec, which have now been added.
The vsite constructor now initializes to atom range to invalid values,
so we can check that the thread splitting has been called before
constructing. This would have caught bug #2257.
Removed the vsite struct from the global construct function argument
list, which simplifies the vsite code in several places and
fixes #2257.
Also some general clean-up: removed some snews, added some camelCasing
and doxygen documentation.
More renaming would be beneficial, but should be a separate commit.
Change-Id: I467ec8b8ebfa0da090d4ac0a1d096ad9fab87eb5
Aleksei Iupinov [Fri, 13 Oct 2017 09:50:19 +0000 (11:50 +0200)]
Relax PME spline computation tolerance in double precision tests
Change-Id: I8c3502dd84e21d20be057d47d4afa589d779eb90
Mark Abraham [Thu, 9 Feb 2017 10:49:38 +0000 (11:49 +0100)]
Update tests for C++11 compiler and standard library
We've started using some more features, so broaden the
range of things for which we check at cmake time.
Also made an explicit error message for older icc that can't handle
newer gcc standard libraries, since this might come up a few times.
Fixes #2116
Change-Id: I3656edb3f7e6f81bbf6ed3ed764bcac56802f87f
Roland Schulz [Wed, 4 Oct 2017 06:48:49 +0000 (23:48 -0700)]
Replace all ConstArrayRef with ArrayRef<const T>
1) Remove the alias itself in arrayref.h.
2) All replacements done automatically using sed:
s#ConstArrayRef<const char \*>#ArrayRef<const char *const>#
s#ConstArrayRef<\(.*\)>#ArrayRef<const \1>#
This worked because "const char*" was the only pointer type used as
template argument.
Change-Id: I5eba895a5dc235b95d77670b4f258e423f64f3b8
Roland Schulz [Fri, 22 Sep 2017 20:43:50 +0000 (13:43 -0700)]
Specialize ArrayRef for SimdReal
ArrayRef<SimdReal> maps to a range of aligned memory and returns a
Simd type from operator[] (more precisely a reference to a Simd type).
This allows to iterate over memory and not have to explicitly call
load/store while also avoiding undefined behavior (strict aliasing rule)
caused by casting between reals and SimdReals.
Change-Id: I3d00df088669dacc810052cbcaebe15e62e1d530
Magnus Lundborg [Tue, 10 Oct 2017 12:13:45 +0000 (14:13 +0200)]
Do not include headers related to ObservablesHistory
Define destructor for ObservablesHistory to avoid having to include
many extra headers.
Change-Id: I2681b519ace728dc494f967d17db5478af09f5df
Mark Abraham [Tue, 10 Oct 2017 09:22:02 +0000 (09:22 +0000)]
Fix cpuinfo on clang + non-x86
Compilers that pretend to be GCC often define such symbols, and the
support for inline assembly does not compile e.g. on ARM. This broke
CPU detection at cmake time, and subsequent compilation. Probably
introduced by commit
863768a4dad. The latest ARM compiler is based on
clang, so we should fix this.
Also de-duplicated some use of compiler target defines
Change-Id: Ia21363b9c0fe112762750d93b9feea267a34319f
Szilárd Páll [Thu, 12 Oct 2017 15:52:15 +0000 (17:52 +0200)]
Remove the size_t from the PME gather CUDA kenels
Change-Id: If53b9eabc1ac081b33933cc773b5ea932c9e8392
Aleksei Iupinov [Thu, 12 Oct 2017 11:00:59 +0000 (13:00 +0200)]
Remove useless extern CUDA texture reference declarations from PME
These are only accessed from the same compilation unit (pme-spread.cu)
on the device side, and the host side is only using nearby getters.
Change-Id: Ie846193c71142ff5e519e990ef1155b534546a9b
Aleksei Iupinov [Thu, 12 Oct 2017 10:31:45 +0000 (12:31 +0200)]
Revert "Drop NB_ from GMX_CUDA_NB_SINGLE_COMPILATION_UNIT cmake define"
This reverts commit
3880255b0, which was made in confusion
stemming from combination of multiple CUDA compilation units,
disabling CUDA textures, and NB CUDA module structure.
The define in question is actually NB-exclusive,
and PME with CUDA does not need to check it to declare
extern texture references. As PME textures are not accessed
from different PME kernels, those extern declarations are removed
in the child change Ie846193c71142ff5e519e990ef1155b534546a9b.
Change-Id: I75a0e62bc92c7161ba0fbf00d8db2f35cef80bc7
Berk Hess [Sat, 30 Sep 2017 21:10:06 +0000 (23:10 +0200)]
Simplify virial handling
The force and virial are tightly connected. This is now expressed
through the new ForceWithVirial object, which is used for algorithms
that compute a separate virial contribution. This clarifies and
simplifies the core mdrun code in several places.
Change-Id: If0f65f1a6f67fb3efc5e4637a183faf4abd5f969
Roland Schulz [Fri, 6 Oct 2017 23:36:50 +0000 (16:36 -0700)]
Require template parameter for load function
The implicit conversion from load(float*) to both float
and SimdFloat caused multiple issues. The primary ones:
- Extra complexity in the implementation of traits, ArrayRef, SimdReference
- required compiler tests for ambiguity
- SimdReal x = f(load(m)) //confusing broadcast if f is scalar function
- x = s*load(m) //error-prone scalar multiply if s is scalar
New syntax in templated function is load<T>(m) and in non-templated function
load<SimdReal>(m). While this is slightly longer by itself, it is clearer
and doesn't require to store values in tempories (no ambigious overload errors).
Also avoids the need for the load proxies.
Change-Id: I8109e9365e956aaea428ec338b6a810444e03d77
Roland Schulz [Sat, 7 Oct 2017 00:34:02 +0000 (17:34 -0700)]
Use tag for simdLoad
Use same simdLoad name for all types. In preparation
for removing the need for SimdLoadProxyInternal.
C++ doesn't support template specialization for
function thus making simdLoad have a template argument
and specialize on it doesn't work. By passing a tag as
a 2nd argument std overloading can be used.
Change-Id: Iaf42ebb74a3347787bcac3bdfd0ef11db1e333bf
Mark Abraham [Thu, 13 Apr 2017 23:31:46 +0000 (01:31 +0200)]
Introduced header for communication to/from PME ranks
No functionality changes. This cleans up some structure, and will be
useful for some modernization, use of std::vector, and then new
allocation strategies to suit PME on GPUs.
Eliminated some things in pme-internal.h by moving some declarations
to a header that can be included by the only two source files that are
interested in PP-PME communication. Now gmx_pmeonly() doesn't have to
pass around a large pile of arguments.
Removed a use of typedef struct, and some function parameter types
that no longer need to specify struct in C++.
Removed some unused PP_PME_* constants.
Change-Id: I51629fb6d91b3a486ef24d1f60065e65261d0376
Aleksei Iupinov [Wed, 11 Oct 2017 16:16:07 +0000 (18:16 +0200)]
Fix clang warnings for PME CUDA kernels
Change-Id: I28f67c70b1ff4611f2456a5935a727c49e10e691
Aleksei Iupinov [Wed, 11 Oct 2017 16:28:24 +0000 (18:28 +0200)]
Relax PME solving test complex grid tolerance
PME CUDA solving change (Ic610e7f) tightened the output grid
tolerance from 50 down to 16 ULPs, making one of the LJPME tests
fail in post-submit. This change relaxes the tolerance to 40 ULPs.
Change-Id: Icd0c1aff868e2d1ecb76522a1a2174b3156fc356
Mark Abraham [Fri, 14 Apr 2017 02:23:44 +0000 (04:23 +0200)]
Cleaned up ewaldcoeff for PME-only ranks
Earlier, runner initializes all kinds of PME ranks with the initial
values of Ewald coefficients. The values passed to gmx_pmeonly were
never read - the variables are used only to store new values, which
happens when the PP rank directs the PME grid to switch grids during
load balancing.
Change-Id: Ibe581a7111239f28f874b43dc13dcc6abd025b60
Aleksei Iupinov [Fri, 25 Aug 2017 17:16:13 +0000 (19:16 +0200)]
CUDA 9/Volta support for PME
Change-Id: Icd5cdf16f9118347179dfcbdd162f0cb39cbdd69
Aleksei Iupinov [Tue, 7 Feb 2017 14:01:54 +0000 (15:01 +0100)]
PME solving - CUDA kernel + unit tests
The CUDA implementation of PME solving is added in pme-solve.cu.
The unit tests for PME CPU solving are extended to work with the CUDA kernel,
using the same reference data.
The CUDA solver supports 2 grid dimension orders: YZX and XYZ
(unlike the CPU one which only supports YZX). This is also tested.
Lennard-Jones solving is not implemented.
The tests iterate over all Gromacs-compatible CUDA GPUs.
Refs #2054
Change-Id: Ic610e7f077f39a64089dd9b80df9905094b10459
Paul Bauer [Mon, 9 Oct 2017 07:48:51 +0000 (09:48 +0200)]
Change to modules for build of web documentation
The modules loaded to build the web documentation with Sphinx have been
incorrect for the minimum version specified by the configuration file.
In particular, the imgmath extension had not been available for
version 1.3 that was indicated as being the minimum version. As
there are no references that I found to any math macros in the files
used to build the docs, I removed the extension to make sure it will
build again. It might be better to have a conditional there, building the
docs without imgmath when using lower versions of sphinx, and having it
active for higher versions.
Changed to require 1.4.1 for now, and added variables that set it
automatically from the information passed to cmake.
Change-Id: Ia329575288e5d622b8e679d76b63759bae54a3b0
Aleksei Iupinov [Fri, 27 Jan 2017 14:49:55 +0000 (15:49 +0100)]
PME force gathering - CUDA kernel + unit tests
The CUDA implementation of PME force gathering for PME order 4 is added
in pme-gather.cu. The unit tests for PME CPU force gathering
(
d20a5d36) are extended to work with the CUDA kernel, using
the same reference data. The tests iterate over all Gromacs-compatible
CUDA GPUs.
Ref #2054
Change-Id: I162e3a14cb9aa8ddeac17c5ad1ca709df72b8986
Aleksei Iupinov [Fri, 6 Oct 2017 14:47:42 +0000 (16:47 +0200)]
Drop NB_ from GMX_CUDA_NB_SINGLE_COMPILATION_UNIT cmake define
Update the messages as well; the build setting should not be NB-exclusive.
Change-Id: I6b730ed7471253d50ce3294de86a1e3ce2733210
Aleksei Iupinov [Mon, 5 Dec 2016 16:36:09 +0000 (17:36 +0100)]
PME spline+spread CUDA kernel and unit tests
The CUDA implementation of PME spline computation and charge spreading
for PME order 4 is added in pme-spread.cu.
The unit tests for PME CPU spline/spread stages
(
e8cf7c0) are also extended to work with
the PME CUDA kernel, using the same reference data.
The tests iterate over all CUDA GPUs which are compatible with Gromacs.
Refs #2054, #2092.
Change-Id: If5ec49f030b9b94395db28fa454ea25c3efb05d1
Mark Abraham [Tue, 12 Sep 2017 09:34:54 +0000 (11:34 +0200)]
Use clang-5 in Jenkins
Change-Id: Ibff723750ff66629da5066c4de43ed3de1198f19
Berk Hess [Mon, 9 Oct 2017 12:07:10 +0000 (14:07 +0200)]
Disable ARM Neon native rsqrt iteration
Fixes #2261
Change-Id: Iebcdb3f85506b8159c06d9a9a5cb5f5c81ba11c9
Aleksei Iupinov [Mon, 9 Oct 2017 10:47:26 +0000 (12:47 +0200)]
Remove some unnecessary includes from PME
Change-Id: I16da782c6c889fdde6ed81044803edfb367624b3
Aleksei Iupinov [Mon, 26 Sep 2016 10:53:29 +0000 (12:53 +0200)]
PME GPU/CUDA data framework.
This patch adds most of the PME GPU data handling routines
(pme.h and pme-gpu.cpp / pme.cu(h) / pme-gpu-internal.h/cpp),
data structures used on host and on device
(pme-gpu-types.h / pme.cuh).
The PME GPU kernels and their host code will live in separate files.
There is also cuFFT code (pme-3dfft.cu(h)) and
optional CUDA timing events (pme-timings.cu(h)) included.
Currently this is a dead code, PME GPU is not actually getting initialized
as the corresponding enum is always set for CPU path, which is asserted.
Change-Id: I9b03f54a2412885e25ea27f17bf2dca1b01f9f78
Mark Abraham [Sat, 7 Oct 2017 12:57:56 +0000 (14:57 +0200)]
Tell cppcheck that .cpp and .cu files are c++
This will avoid known strange behavior with .cu files.
Fixes #2265
Change-Id: I06850509afa11a531fe9b7063368614c81f4a7d1
Roland Schulz [Fri, 22 Sep 2017 17:05:02 +0000 (10:05 -0700)]
Refactor load+SimdLoad*ProxyInternal to reduce duplication
A child patch requires both the const and non-const version.
This would increase the duplication per (un/)aligned load/proxy
from 3 to 6. To avoid this, refactor here to remove the duplication.
Also remove GMX_DISALLOW_COPY_AND_ASSIGN because child patch requires it
to be copyable and macro doesn't work with templated class.
Change-Id: Ib0f6867d2ef75132a1dfa28ea761633adf7d1e8d
Berk Hess [Tue, 3 Oct 2017 11:44:37 +0000 (13:44 +0200)]
Add acceleration correction VCM mode
Minor refactoring in vcm.cpp using templating to reduce code
duplication and improve performance.
Change-Id: I560027fe6f315eede0d7aa573bc22bf12eba4c5d
Berk Hess [Fri, 6 Oct 2017 10:21:13 +0000 (12:21 +0200)]
Fix max_size() in allocator.h
Change-Id: I2d3c0bf2782b59d25f987ed0522d98fd1dc61e15
Berk Hess [Wed, 5 Jul 2017 09:44:42 +0000 (11:44 +0200)]
Add SIMD intrinsics version of simple update
To get better performance in cases where the compiler can't vectorize
the simple leap frog integrator loop and to reduce cache pressure of
the invMassPerDim, introduced a SIMD intrinsics version of the simple
leap-frog update without pressure coupling and one T-scale factor.
To achieve this md->invmass now uses the aligned allocation policy
and is padded by GMX_REAL_MAX_SIMD_WIDTH elements.
Asserts have been added to check for the padding.
Change-Id: I98f766e32adc292403782dc67f941a816609e304
Berk Hess [Wed, 4 Oct 2017 07:36:40 +0000 (09:36 +0200)]
Improve PaddedRVecVector resizing
PaddedRVecVector is now resized with paddedRVecVectorSize() instead
of the using natoms+1 in several places in the code.
In preparation for a SIMD intrinsics version of the integrator,
PaddedRVecVector now uses the aligned allocator and has padding
for GMX_REAL_MAX_SIMD_WIDTH width at the end.
do_force() checks for padding of coordinates and forces.
Also extended ArrayRef to allow for different allocators.
Fixes a potential nullptr issue in doPaddedRvecVector().
Change-Id: Ifb8dacf4f41b755b9981e936e462fa5823cff1d8
Aleksei Iupinov [Thu, 28 Sep 2017 13:08:50 +0000 (15:08 +0200)]
Add timing accumulation capability into GpuRegionTimer
Added a TODO to deprecate NB timing structures in favor of new
functionality.
Change-Id: Idb78e5a36a7f372f01378a580a05b928bd728c57
Berk Hess [Thu, 5 Oct 2017 16:31:58 +0000 (18:31 +0200)]
Make SIMD ambiguous test work with float/double only
Fixes #2262
Change-Id: I612aea147a7808a72aa05dca44f588a60c27eea0
Szilárd Páll [Mon, 2 Oct 2017 23:55:09 +0000 (01:55 +0200)]
Refactor scaled box handling for PME/Ewald
The box scaling code for Ewald and related logic is now encapsulated in
new box scaler class. This also allows box scaling moving box scaling
it further down the call chain so that logic is reduced in do_force.
Change-Id: Ic5071b825b9d36daca5f49d6f7c6c50261af1e1b
Szilárd Páll [Thu, 5 Oct 2017 01:20:10 +0000 (03:20 +0200)]
Fix pair list array assertion
Was caught by the clang CUDA build.
Refs #2259
Change-Id: I8633e48ea5c33225829f92f60c2c785f16e17aca
Szilárd Páll [Thu, 5 Oct 2017 00:53:15 +0000 (02:53 +0200)]
Avoid -Wmissing-prototypes warnings in clang CUDA builds
Fixes #2259 partially
Change-Id: Ib47530c89cf4b3af5c2b65cfcfa39b9dfe0148b4
Roland Schulz [Wed, 4 Oct 2017 00:32:25 +0000 (17:32 -0700)]
Make ConstArrayRef alias to ArrayRef<const T>
Removes code duplication and makes it easier to extend.
Change-Id: I4ac01e3be89bb9f2ad92937ce19e436ac7f1178b
Mark Abraham [Mon, 2 Oct 2017 16:19:05 +0000 (18:19 +0200)]
Continue removing -nb gpu_cpu
Now that hybrid mode is gone, both local and non-local Verlet-scheme
groups use the same kernel_type and thus both nbat pointers were
always the same. Thus, there's no reason to maintain two of them.
This simplifies and slightly optimizes nbnxn_atomdata_set().
Also fixed some other docs, comments, and logic that were either
already wrong, or are useless with hybrid mode gone.
Change-Id: Id02a11a00553b1df151a1e15b934611e0e15b9f7
Roland Schulz [Wed, 27 Sep 2017 22:15:46 +0000 (15:15 -0700)]
Fix that exp(load(x)) is ambiguous
Because of the template argument added by
e34ead15, exp(float) was
preferred over exp(SimdFloat). This causes code which is ambiguous
to compile and creates unintuitive behavior.
Also:
- Add compile tests that these functions don't compile when they
are used ambiguously.
- Add sqrt to scalar_math which was missing
Change-Id: Ic7582764f1fa9b644d4608536100004e5f737462
Roland Schulz [Tue, 3 Oct 2017 19:40:09 +0000 (12:40 -0700)]
Allow non-const access for const ArrayRef
const ArrayRef<T> should not mean that the data contained in the range
is const. If it were used that way, it behave very purely as a const
object because if it is copied it looses its const. Instead
ConstArrayRef<T> (or potentially ArrayRef<const T>) should be used
consistently. This makes having const versions of operator[]/at and
begin/end/front/back confusing and inconsistent. The meaning of
const ArraryRef should be that reassigning a new range to the ref
is not allowed. To be consistent, even for a const ArrayRef non-const
access to the data should be allowed.
Change-Id: I7e921c1a5562889bfd26eb0688b475406c87857c
Aleksei Iupinov [Thu, 28 Sep 2017 13:22:55 +0000 (15:22 +0200)]
Remove GPU timing functions superseded by GpuRegionTimer
Change-Id: I451fdd9263e87a567c8b4942a041de6950c75921
Berk Hess [Fri, 29 Sep 2017 10:36:28 +0000 (12:36 +0200)]
Remove a force buffer argument from IForceProvider
For the foreseeable future force providers will produce forces that
should not go into the coord x force type virial calculation.
Therefore the old force argument is now removed and the old f_novirsum
argument is renamed to force. Also removed the confusingly named
withoutVirialContribution_ container (because providers should actually
provide the virial contributions).
Change-Id: I580ac2cef31da6c191a0852c1f3be0c9e09ea1b0
Mark Abraham [Sat, 15 Jul 2017 07:09:08 +0000 (09:09 +0200)]
Enable group-scheme SIMD with GMX_SIMD=AVX2_128
The group-scheme kernels can can use AVX instructions from either the
AVX_128_FMA and AVX_256 extensions. But hardware that supports the new
AVX2_128 extensions also supports AVX_256, so we those extensions
for the group-scheme kernels.
Change-Id: I7728e1d97998509368dfc456dc9c33489b6dfee5
Aleksei Iupinov [Fri, 16 Jun 2017 16:04:51 +0000 (18:04 +0200)]
Add GpuRegionTimer class
GpuRegionTimer class is implemented in CUDA/OpenCL to help timing
regions of GPU code. For CUDA, it contains pair of CUDA start/stop events;
for OpenCL, it contains fixed-size array of OpenCL events, which are
passed manually into OpenCL calls within region.
Using this class, cu/cl_timers_t structures and the GPU timing code
are made a bit more uniform.
Change-Id: I8fcd7568653c5bcf3496a2bee227ae03f14a756a
Szilárd Páll [Tue, 3 Oct 2017 14:04:46 +0000 (16:04 +0200)]
Move Ewald splitting-related code to ewald-utils
This code used to be under gromacs/math, but it is an Ewald-only
functionality, so it belongs to the Ewald module.
Change-Id: Ib2d7def243a88b09f328a8ea1bafe27dd7a783d5
Aleksei Iupinov [Tue, 3 Oct 2017 14:43:07 +0000 (16:43 +0200)]
Bring CUDA texture getters documentation up-to-date
Change-Id: Id2b588129287a0e2372fd3564f8a24513ae89e0e
Aleksei Iupinov [Fri, 22 Sep 2017 13:35:28 +0000 (15:35 +0200)]
Template/move CUDA texture cleanup code from NB CUDA module to cudautils.cu
Noted TODO: easy transformation into a GPU table class.
Change-Id: I20d684221fa8304d01ab7fd4a19f2c4469110142
Szilárd Páll [Wed, 25 Jan 2017 01:53:17 +0000 (02:53 +0100)]
Enable compiling CUDA device code with clang
clang can be used as a device compiler by setting GMX_CLANG_CUDA=ON. A
CUDA toolkit (>=7.0) is also needed. Workarounds required:
- texture operations are not supported, use the LDG/direct load-based
fallback in such cases;
- CMake does not support natively clang for CUDA, but it's easy to
convince it by setting CXX as compiler and few extra flags for *.cu.
Note that clang support is experimental and it is aimed at improving
portability and to allow using clang sanitizers without hassle in
CUDA builds.
TODO/investigate:
- CMake seems to not track some files properly with clang, changes
to nbnxn_cuda_kernel{,_fermi}.cuh do not trigger a recompile (likely
due to the indirect include through a macro in nbnxn_cuda_kernels.cuh).
- Full rebuild is triggered even if only CUDA compile flags are changed.
Change-Id: I3543469d9f0fda37c186ba8bb474980018bd5c54
Berk Hess [Thu, 28 Sep 2017 21:01:14 +0000 (23:01 +0200)]
Move calling of special force functions
All special force provider algorithm functions are now collected
in computeSpecialForces(). This function is now called before
the wait for non-bondeds on the GPU. This allows for more overlap.
The only cost is potentially one extra force buffer reduction at
steps where the virial is needed, but with PME we already do that.
To achieve this, f->f_novirsum is now always set, as the
documentation already (incorrectly) stated.
This change also simplifies both do_force routines and clarifies
where special algorithms can be called, which could now actually
be anywhere in do_force().
Change-Id: I0711a379ed3c31838ede9e55c4cd5d0a95e967fd
Aleksei Iupinov [Wed, 20 Sep 2017 13:45:13 +0000 (15:45 +0200)]
Add WinAPI page size query for allocator purposes
Change-Id: I4c1e534a4b950bc38d45c40ad5598cabb2045ede
Magnus Lundborg [Tue, 13 Jun 2017 15:04:10 +0000 (17:04 +0200)]
Write atom masses and partial charges to TNG
B states of molecules are not yet written. Data blocks for that
must be added to TNG first.
'gmx dump' will print atom masses and partial charges along
with the molecule, if the data is available. No other utilities
use the data yet.
Refs #2188.
Change-Id: I7dd80d7b6281b2c3710fb541fa3cee6fbdcb2256
Mark Abraham [Mon, 3 Jul 2017 17:40:10 +0000 (19:40 +0200)]
Introduce and use AlignedAllocationPolicy
This permits the allocators to be specialised with a policy class that
can vary (and report on) the alignment.
Change-Id: I875548ac325edcf07074ad35f9d90cdf561ea750
Berk Hess [Wed, 9 Aug 2017 12:06:01 +0000 (14:06 +0200)]
Remove hybrid gpu+cpu nonbonded mode
This mode was not very useful, since it ran the non-local non-bonded
interactions on the CPU. The fraction of non-local interaction is set
by the domain decomposition, so this is not flexible.
Also this mode is not being tested.
Amended docs to remove ambiguous reference to hybrid mode
Change-Id: I8cdd31f228e2c0104527f66265fd06a5e971b9da
Szilárd Páll [Thu, 28 Sep 2017 14:22:17 +0000 (16:22 +0200)]
Correct misplaced CUDA timing event record
The event record happens after a cudaStreamWaitEvent is placed in the
non-local stream and therefore, in that stream it would include the wait
time in the measurement. However, as with DD / two streams timing can
not be performed due to the limitations of CUDA events, in practice this
was never an issue.
Change-Id: I2ca89c7acd461e480a324d40911dd4c6f5aac478
tomaskubar [Mon, 18 Sep 2017 16:57:19 +0000 (18:57 +0200)]
QM/MM: remove optimization and trans. state search
These functionalities used to only work with old versions of Orca,
had very limited use and will possibly not work any longer now.
The possibility of optimization with QM/MM using the Gromacs drivers
(steep, cg, lbfgs) has been already tested,
and it will be added in a follow-up commit.
Transition state search functionality will not be available however.
This is a beginning of an attempt to clean up the QM/MM code.
A version that works with Verlet nblists is already prepared,
and will follow also.
Change-Id: Ice166dad0d4080821f1a20671fe698d549b3ce99
Aleksei Iupinov [Fri, 22 Sep 2017 12:41:02 +0000 (14:41 +0200)]
Move CUDA texture setup code from NB CUDA module to cudautils.cu
Change-Id: I7e47a65866c29be06ce522572e90a17c775157ab
Szilárd Páll [Thu, 21 Sep 2017 14:13:43 +0000 (16:13 +0200)]
Rename GPU launch/wait cycle counters
In preparation for the PME GPU task and GPU launch overhead to be
counted together in the same counter for all GPU tasks, the current main
counters have been renamed to be more general. The label of GPU waits in
the performance table have also been renamed to reflect the task name.
Additionally a non-bonded specific sub-counter is been added.
Change-Id: I65a15b0090c1ccebb300cf425c7b3be4100e17a0
Berk Hess [Sat, 23 Sep 2017 09:28:01 +0000 (11:28 +0200)]
Put rerun state preparation in a function
This is only code motion and some variable renaming.
Note that the vsite construction now constructs in the global state
pointer, but the old code asserted that this was identical to
the local state pointer.
Change-Id: I81d0bad0650eb781b68816e917814b2f97e2d529
Berk Hess [Wed, 6 Sep 2017 13:48:04 +0000 (15:48 +0200)]
Set global state on master rank only
The global state class was initialized on the master rank
and broadcasted over all ranks. This lead to unnecessary communication
and it was difficult to know what part of the global state was up to
date at different points in the code.
Now only the master rank has a global state object.
This change requires conditional access to the global state pointer
for the few algorithms that use it.
Replaced bcast_state() by a function that only broadcasts x and box.
Also made the box pointer constant in domdec initialization.
Change-Id: I924487863abe096eeb0b3cbc944b4ba32898ef03
Berk Hess [Sat, 23 Sep 2017 10:15:57 +0000 (12:15 +0200)]
Fix grompp with Andersen massive and no COM removal
Fixed a floating point exception leading to a segv.
Also fixed possible different rounding for the interval for
Andersen massive in grompp in mdrun for the common case where tau_t
is a multiple of delta_t.
Fixes #2256
Change-Id: I161e8a9db2c31fde8a6e8c2fd32551b21423fd9b
Aleksei Iupinov [Thu, 21 Sep 2017 16:31:48 +0000 (18:31 +0200)]
Template the CUDA texture setup code on raw value type T
Change-Id: I252e1d68d263f4aca15f00863e9ed67213fdb22f
Aleksei Iupinov [Thu, 21 Sep 2017 15:51:49 +0000 (17:51 +0200)]
Eliminate unused Coulomb correction table size coulomb_tab_size
Change-Id: I35df7fc0590e12de8ad44791bce26cac1193065a
Aleksei Iupinov [Thu, 21 Sep 2017 15:43:17 +0000 (17:43 +0200)]
Eliminate duplicate CUDA texture setup code
init_ewald_coulomb_force_table() now uses initParamLookupTable(),
which is moved higher in the file.
Change-Id: I2cb799f1b4b78c650282be37b125b3da88e98f6c
Vedran Miletić [Tue, 19 Sep 2017 17:32:41 +0000 (19:32 +0200)]
Document t_commrec struct
Describe the mysim and mygroup MPI communicators and add a note about
the communicator subsetting.
Change-Id: I2d39bd6827da59db5b3c38bd33de6903e17d0d06
Berk Hess [Tue, 12 Sep 2017 11:21:50 +0000 (13:21 +0200)]
Change nbnxn indexing to templates
Replaced the index lookup functions for SIMD pair list generation
with templates.
The last remaining ci_to_cj function and call will be removed in
a follow-up change.
Change-Id: I3222e616ac60f846d7c1e85685f915fba014290f
Roland Schulz [Fri, 15 Sep 2017 21:47:37 +0000 (14:47 -0700)]
ICC: Disable include path warning
Under certain circumstances MKL and PSTL can cause spurious
warnings. Unlikely for warning to be important.
Change Icdb24d0fed4060 previously removed this flag for C++.
Change-Id: I828c73c7139532dca6ee2d1279b0d1e109b0cf5c
Mark Abraham [Fri, 15 Sep 2017 08:39:52 +0000 (10:39 +0200)]
Update testing matrices to test with (latest) icc-18
Change-Id: Ia1da377b40fcaba4f364b77437dcddea702968a2
Aleksei Iupinov [Mon, 18 Sep 2017 13:37:44 +0000 (15:37 +0200)]
Skip GPU reallocation of an empty non-local pairlist
This makes the corresponding timing event's usage consistent.
The common early return condition is moved into a new header
nbnbn_gpu_common.h, shared by OpenCL/CUDA.
Change-Id: I5b9cc027676cccda69f5b844d07e94e5fda620e7
Szilárd Páll [Wed, 13 Sep 2017 18:37:39 +0000 (20:37 +0200)]
Minor refactoring of CUDA non-bonded kernels
Refactored the dynamic shared memory pointer assignment for improved
robustness. Also reverted some unnecessary leftover changes introduced
with the Volta-enabling and dynamic pruning that resulted in differences
between the legacy and normal CUDA kernels with the goal of improving
maintainability.
Change-Id: I4ce58f3e1f963935d7e3832ed0e537c94d81a632
Mark Abraham [Sun, 17 Sep 2017 21:43:42 +0000 (23:43 +0200)]
Merge branch release-2016
Kept a cmake variable description string from release-2016
Change-Id: I2077780afbbe3cc610d699ea498caa6959e4000c
Berk Hess [Wed, 14 Sep 2016 10:42:31 +0000 (12:42 +0200)]
Remove nb-parameters from t_forcerec
Removed all Coulomb and Van der Waals mdp parameters from t_forcerec.
The ones that are not used in the Verlet scheme (yet) have been added
to interaction_const_t. Now init_interaction_const() is called early
in init_forcerec().
Change-Id: I9ca14f2194742cc8b0aef09a42dbd75fa3f94517
Mark Abraham [Mon, 11 Sep 2017 16:55:00 +0000 (18:55 +0200)]
Express intent of matrix release build types explicitly
We intend to do primary testing in Jenkins with all assertions
enabled, so that we maxmize our opportunities for useful feedback. We
do so in both Debug and Release flavours, so that we have confidence
that those different builds will remain maximally useful.
Expressing that we are building the Release+assert type explicitly
will help developers understand when and why they get feedback from
Jenkins.
This also permits us to build without assertions in a convenient
way. This is useful for e.g. checking that such builds remain free of
warnings, which is a useful corner case to have changed, since a
variable or parameter might only be used in an assertion expression.
This now happens in one post-submit and all release matrix
configurations.
Updated the release-matrix contents and documentation to be more
consistent with that of the other matrices.
Noted TODO for adding icc 17 testing.
Change-Id: I774602e361163337e9602f497e2302148ccbf544
Mark Abraham [Wed, 6 Sep 2017 08:38:55 +0000 (10:38 +0200)]
Bumped patch version for next release
Change-Id: I2d5758148734f746f9f03afd1ce3a92a02ab3355
Mark Abraham [Wed, 6 Sep 2017 08:36:54 +0000 (10:36 +0200)]
Version 2016.4
Change-Id: I1b8141d667629f77025b186364090e861fa3a428
Aleksei Iupinov [Thu, 14 Sep 2017 13:12:31 +0000 (15:12 +0200)]
Fix non-SIMD build failing
gmx_detect_simd() now returns an uppercase string,
making GMX_SIMD_ACTIVE uniformly uppercase.
Refs #2246
Change-Id: Idb1c2373dc8955ef8bdb36eb0361505238a117ff
Berk Hess [Thu, 14 Sep 2017 14:34:35 +0000 (16:34 +0200)]
Fix nbnxn SIMD 2xNN PME tabulated kernel
This kernel produced completety incorrect forces and slightly
incorrect energies.
This kernel was not (yet) selected by default in a 2016 release.
Fixes #2247
Change-Id: I297ad257932eaabe6cf84b17f34ad555921f48b0
Szilárd Páll [Tue, 29 Aug 2017 16:39:03 +0000 (18:39 +0200)]
CUDA 9/Volta support for the pruning kernel
Added syncwarp to prevent WAR hazard and changes any to any_sync.
Change-Id: I3d370e7c272dd0c2f0eaf8ee97f8797dd74a405d
Mark Abraham [Fri, 8 Sep 2017 16:21:12 +0000 (18:21 +0200)]
Form taskassignment module
Move existing code into new module, in preparation for an increase
in complexity of that new module.
Removed hyphenation of filename to move to more consistent style
overall.
Change-Id: I04e76048483b287485ac6872ed48ccba52df1e45
Mark Abraham [Mon, 11 Sep 2017 15:59:24 +0000 (17:59 +0200)]
Fix compat module definition
There needs to be a defgroup module_compat and whatever name it has
should match the name of the subdirectory. This needs to work
correctly before code in other modules can include a header from
compat module, else check-source will complain that a compat header
isn't documented as being used outside its module.
Change-Id: Ie6e17d92ecf7ede51e0e32af27bef732a52491f1
Erik Lindahl [Fri, 8 Sep 2017 19:03:58 +0000 (21:03 +0200)]
Improve accuracy of SIMD exp for small args
Introduce a separate check for small arguments and
set the result to zero below this cutoff. This change
also moves the exponential tests to ldexp(), which
as a side-effect also makes that function safer by
default, with an optional template parameter to avoid
the checks.
Fixes #2243.
Change-Id: I41bccaeec9921c3aead2cd2caf41cbe2206e0687
Mark Abraham [Tue, 12 Sep 2017 10:01:48 +0000 (12:01 +0200)]
Merge branch release-2016
No conflicts
Change-Id: Iababf3d2439f1b465ebdf8bcb3d29815cb7830e1
Aleksei Iupinov [Wed, 6 Sep 2017 15:18:44 +0000 (17:18 +0200)]
Prevent reference XML reader segfault on nullptr error strings
It is possible for TinyXML2 to return null pointers to the error strings.
This could cause XML reference data reader to segfault.
Refs #2241
Change-Id: I8b72917785080023f75388281dd2cbb4f30da925
Berk Hess [Fri, 8 Sep 2017 13:35:23 +0000 (15:35 +0200)]
Fix exception in SIMD LJ PME solve
Clear SIMD padding elements in solve helper arrays to avoid,
otherwise harmles, fp overflow exceptions.
Fixes #2242
Change-Id: I97e67c4fcc2ef361f54d1627fd0dab4621f4bd33
Berk Hess [Thu, 7 Sep 2017 17:52:29 +0000 (19:52 +0200)]
Do not pass state to init_replica_exchange
init_replica_exchange only used the state to check natoms,
so now we pass only that.
Change-Id: Ie75ca26b9899006f9ca4584c42391a50758c7ed5
Szilárd Páll [Mon, 11 Sep 2017 17:33:09 +0000 (19:33 +0200)]
Merge branch 'release-2016'
Note that changes to the simd_math module/tests from the 2016 branch
were omitted in favor of the current code in master.
Conflicts:
src/gromacs/fileio/oenv.cpp
src/gromacs/fileio/oenv.h
src/gromacs/gmxlib/nonbonded/CMakeLists.txt
src/gromacs/hardware/cpuinfo.cpp
src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda_kernel.cuh
src/gromacs/simd/simd_math.h
src/gromacs/simd/tests/simd_math.cpp
src/gromacs/swap/swapcoords.cpp
Change-Id: I357e40f97fd53a34ff900f40bb3fdeb20d864c13
Szilárd Páll [Mon, 4 Sep 2017 15:26:59 +0000 (17:26 +0200)]
NVIDIA Volta performance tweaks
Removed ballot syncs and replaced all computed masks with full warp
mask (as all branches in question are warp-synchronous).
This improves performance by 7-12%.
Change-Id: I769d6d8f0d171eb528d30868d567624d5e246dbf
Berk Hess [Fri, 8 Sep 2017 12:49:40 +0000 (14:49 +0200)]
Fix PBC bugs in the swap code
Fixes #2245
Change-Id: I90e2bed71a2499c63794e420dca383d91e6fc86c
Berk Hess [Sat, 22 Apr 2017 09:41:54 +0000 (11:41 +0200)]
Avoid inf in SIMD double sqrt()
Arguments >0 and <float_min to double precision SIMD sqrt()
would produce inf on many SIMD architectures. Now sqrt() will
return 0 for arguments in this range, which is not fully correct,
but should be unproblematic.
Updated the tests to check for this range and to produce output
that checks all double precision mantissa bits.
Fixes #2164.
Refs #2163.
Change-Id: Ic6d2c6d4102d602703b40e7e8bcc1974a7283f7c
Roland Schulz [Thu, 7 Sep 2017 00:40:06 +0000 (17:40 -0700)]
Fix compiler warning
Change-Id: Ic665baecec83f89bebb0fddea5385ae88c075adb
Mark Abraham [Mon, 4 Sep 2017 07:49:59 +0000 (09:49 +0200)]
Introduce type-safe GPU emulation variables
It's less clearly correct to pass around many variables of bool type,
as often happens in the high-level code.
Change-Id: I9ddb35fa8789e3eb5726fc0143a580493737bc29
Roland Schulz [Mon, 31 Jul 2017 23:56:00 +0000 (16:56 -0700)]
Add isIntegralConstant type trait
Add static assert to pme-gather to make overload safer
Change-Id: I71c67f3ee40185e31796752d090d2aa9f0918ec8
Mark Abraham [Wed, 6 Sep 2017 10:31:52 +0000 (12:31 +0200)]
Enable group-scheme SIMD kernels on recent AVX extensions
The group-scheme code only runs using the feature set of AVX_256, but
that is supported on the more recent hardware, so we should have the
group scheme run with the maximum suitable SIMD. Otherwise people will
need AVX_256 binaries for the group scheme, and other binaries for the
other support.
Change-Id: I7728e1d97998509368dfc456dc9c33489b6dfee5
Berk Hess [Thu, 7 Sep 2017 08:05:33 +0000 (10:05 +0200)]
Fix FEP state with rerun
When using FEP states with rerun, the FEP state was always 0.
Fixes #2244
Change-Id: I457bf444f6c7f8fd357212416311625981b833e6
Mark Abraham [Mon, 4 Sep 2017 21:53:49 +0000 (23:53 +0200)]
Prepare Verlet scheme later in runner
As noted in the TODO, preparing the Verlet scheme can wait until we
need rlist chosen so that we can set up the domain
decomposition. Moving it later in mdrun helps clarify the unrelated
process of setting up thread-MPI, and also means that the state used
when preparing the Verlet scheme is the one from the checkpoint file,
where applicable.
Moved a warning for the group scheme on BG/Q to a more appropriate
place.
Noted some TODOs for managing kinds of state better in future.
Change-Id: I99a014d9465b46b32038cfdcee99dc3896d2f265
Roland Schulz [Thu, 20 Jul 2017 22:07:49 +0000 (15:07 -0700)]
PME-gather: Use templated functor instead of preprocessor
Added restrict in several places, but this does not affect performance
with gcc and icc.
Change-Id: Id366621fa3ad02ca182b8a4da48cae940059cf46
Christoph Junghans [Sun, 13 Aug 2017 15:52:18 +0000 (09:52 -0600)]
Supported quiet trajectory-handling I/O
Permits GMX_TRAJECTORY_IO_VERBOSITY=0 to be set to keep frame-reading
code quiet, which is convenient for tools using libgromacs.
Change-Id: I873dcf229d9c20c8dd3b5097784236c2c2c478c1
Mark Abraham [Sun, 3 Sep 2017 13:59:12 +0000 (15:59 +0200)]
Fixes for Solaris
Suggested by Maureen Chew, of Oracle.
Change-Id: I5c3f868721944e7586b7001f3ffdf6ab17953526