Aleksei Iupinov [Thu, 9 Nov 2017 18:01:43 +0000 (19:01 +0100)]
Rename and expose "generic" GPU memory transfer functions
Dropped the "_generic" suffix from the names. Made the sync/async
argument an enum class instead of boolean.
Made PME use synchronous versions of the functions for unit tests.
Change-Id: I5fd2490d58370d9f0405aea1a74237fa8107cbab
Erik Lindahl [Sun, 12 Nov 2017 13:14:10 +0000 (06:14 -0700)]
Only issue FFT warning messages on changes
Similar to other CMake modules, we should only issue
warnings at the first invocation, or if the FFT library
was changed.
Change-Id: I6dba59f1021984d9a744a55d797814c1c9d89b20
Roland Schulz [Sat, 8 Jul 2017 00:40:48 +0000 (17:40 -0700)]
PME-gather: 4xN SIMD
Speedup on KNL 11% for spread/gather (3% total) on ion-channel
Change-Id: I1a0624408b4e8f7bd441dfe2c260f80d211351d0
Mark Abraham [Tue, 24 Oct 2017 15:40:58 +0000 (17:40 +0200)]
Import cmake Modules/FindCUDA.cmake
CUDA 9.0 issues large numbers of -Wundef warnings from its internal
headers. FindCUDA.cmake should be including such headers as "system"
headers, so to prepare for a patch where it is modified to do that,
this commit imports that file from v3.4.3 of the CMake repository,
because that is a choice likely to work with all future versions of
CMake.
It needs some supporting cmake files that are included unmodified,
so GROMACS does not assert copyright on those. The main FindCUDA.cmake
file is modified only to be able to find those files
Refs #2276
Change-Id: I69ad39dc805648a6cc5e27bb7fcd229f5f2a538a
Roland Schulz [Thu, 9 Nov 2017 22:43:45 +0000 (14:43 -0800)]
Rename load1DualHsimd to loadU1DualHsimd
Documentation didn't require any alignment, test didn't use
alignment and all implementations didn't require any alignment.
But name suggested that alignment is required.
Only current usage had 2-wide alignment but requiring that
would make the function less general without any advantage.
Change-Id: I651c1327a3febc368cb4b039ad226d0771770e60
Roland Schulz [Thu, 9 Nov 2017 23:02:41 +0000 (15:02 -0800)]
AVX: Improve load1DualHsimd
instr+uop: 4->3, throughput/port-pressure(on 5): 3->1
(IACA numbers for IVB-SKL)
Change-Id: Id768cb951dcbace1473448fcd63fa7d40b0e7da6
Roland Schulz [Fri, 10 Nov 2017 02:08:08 +0000 (03:08 +0100)]
Revert "Use -mavx2 -mfma instead of -march with AVX2"
This reverts commit
062a6b81498b61b2bfc4ec7441b844d76aae445b.
Reason for revert: Breaks support for ICC (16-18) which doesn't have -mavx2 or -mfma.
Change-Id: I01cf3e9db332a405fd9419b6382240f5fcecf633
Aleksei Iupinov [Thu, 9 Nov 2017 18:07:30 +0000 (19:07 +0100)]
Rename synchronous GPU transfer functions to match the asynchronous ones
Change-Id: I5cb8e9cab208c1d0c62f985ec3140540ea427fb2
Mark Abraham [Wed, 8 Nov 2017 11:22:57 +0000 (12:22 +0100)]
Prepared t_mdatoms for using vector
Wrapped it in another C++ class because the group-scheme kernels
compile as plain C and this permits the contained t_mdatoms to
be unmodified. The class has responsibility for maintaining the
allocations for any of the fields of t_mdatoms that need to be
managed with a std::vector plus perhaps an allocator.
Change-Id: I6fef70beeb8d43f3e048cec02380f8ebf8153ecb
Mark Abraham [Mon, 6 Nov 2017 07:45:49 +0000 (08:45 +0100)]
Introduce HostAllocationPolicy
This permits host-side standard containers and smart pointers to have
their contents placed in memory suitable for efficient GPU transfer.
The behaviour can be configured at run time during simulation setup,
so that if we are not running on a GPU, then none of the buffers that
might be affected actually are. The downside is that all such
containers now have state.
Change-Id: I9367d0f996de04c21312cef2081cc08148f80561
Roland Schulz [Mon, 30 Oct 2017 18:33:33 +0000 (11:33 -0700)]
ICC should use ZMM if code anyhow uses ZMM
Change-Id: Iaea73df12065b3d4ba1974e48b864f44c9b7fe44
Roland Schulz [Wed, 25 Oct 2017 19:24:42 +0000 (12:24 -0700)]
Fix scalar blend
Change-Id: I580af279cdba494ec13029259e4fd0867a7e5ea2
Magnus Lundborg [Mon, 23 Oct 2017 11:20:00 +0000 (13:20 +0200)]
Update to TNG v 1.8.1
Fixes #2187 and #2250.
Change-Id: Icf81d5f3ce916e984750e1511d32e16ebc45b6f9
Szilárd Páll [Wed, 18 Oct 2017 15:01:54 +0000 (17:01 +0200)]
Use -mavx2 -mfma instead of -march with AVX2
This was (likely) only a workaround for some early gcc version that did
not support correct AVX2 code-generation with just the -mavx2 -mfma
flags. However, just as with other SIMD flavors with AVX2 too we should
not request arch-specific tuning just to get the desired SIMD flavor
enabled.
Change-Id: Ib0c6388bebcffbf0719b438451d3943f51fba4a4
Mark Abraham [Tue, 7 Nov 2017 02:26:32 +0000 (03:26 +0100)]
Reform gmx_pme_pp alloc and use vector
Introduced a helper struct for describing the partner PP ranks.
Reduced some of the conditional compilation.
Updated some naming from node to rank.
Fixed over-use of charge_pp.
Change-Id: I00b59dd116740721ed707af4242c0d44f1615d56
Mark Abraham [Mon, 6 Nov 2017 07:43:24 +0000 (08:43 +0100)]
Introduce gmxopencl.h
This header wraps the different ways to include the main OpenCL header
on different platforms, including suppressions for the warnings about
usage of deprecated API elements. NVIDIA only official supports the
version with the deprecated elements, so we need to continue to use it.
Change-Id: Ie24f20d43272e1747bcbd693815e96cc200d5f50
Szilárd Páll [Fri, 20 Oct 2017 20:26:25 +0000 (22:26 +0200)]
Merge common nbnxn CUDA/OpenCL GPU wait code-paths
The entire GPU wait including timing accumulation as well as staging
data reducion of the nonbonded GPU modules has been unified by
including a single templated version of the code into the common header.
Code has only been moved and changed in minor ways when necessary (e.g.
for the rvec reduction).
Change-Id: Ic9c9690be58a78f92ca99d2af30068e19c19cc6c
Mark Abraham [Tue, 10 Oct 2017 10:09:10 +0000 (10:09 +0000)]
Test clang on ARM in nightly matrix
Also suppress lots of compiler warnings from useless use of
__vectorcall on this target for this compiler.
ARM are targetting clang for future development, so hopefully this
either isn't needed or will work in future. Either way, this change
will continue to do the right thing.
Change-Id: I211952a24aefee8434cc6b32322f359b2a22687b
Szilárd Páll [Mon, 23 Oct 2017 14:11:46 +0000 (16:11 +0200)]
Add wallcycle timer for the PME GPU F reduction
Change-Id: I85185f2acdf3ebdcbac109ef723eb458bc0e9008
Szilárd Páll [Fri, 20 Oct 2017 17:52:13 +0000 (19:52 +0200)]
Split off nbnxn GPU timing and staging reduction
Code reorganization that moves the timing related functions as well as
energy and shift force reduction into separate functions in both CUDA
and OpenCL versions of nbnxn_gpu_wait_for_gpu().
Change-Id: Ic5c9694d9de7f80a772e97f5c9e05bab77a3b82a
Mark Abraham [Tue, 7 Nov 2017 01:52:18 +0000 (02:52 +0100)]
Improve PME includes
Changing an internal ewald-module header for GPU support should not
lead to files outside that module needing to be recompiled. Moved enum
declarations for use outside the module to the header file that
declares such things. Restored necessary includes that were being
satisfied transitively from the internal header, that were prematurely
removed in
fae8902688dc48be56e.
Change-Id: I18c3146e80aba9ad0a2c485f2355bc214cbb083c
Szilárd Páll [Fri, 20 Oct 2017 18:55:45 +0000 (20:55 +0200)]
Deduplicate CUDA and OpenCL timer struct
The struct is identical in both CUDA/OpenCL so it's better placed in a
common header, but this needs to be an internal-only header as it pulls
in CUDA dependencies.
Change-Id: I907d68b7c298f2ba0e7a1af2baf4819f637e2f2e
David van der Spoel [Thu, 12 Oct 2017 07:06:44 +0000 (09:06 +0200)]
Fixed check for water in gen_vsite.cpp
Pdb2gmx would break when generating virtual sites if water oxygens
were not named OW. Now checking for the atomnumber instead.
Fixes #2268
Change-Id: I326f683e4940ad02351dcbe0c00e266a82b203f6
Mark Abraham [Fri, 3 Nov 2017 01:05:42 +0000 (02:05 +0100)]
Merge "Merge branch release-2016"
Berk Hess [Wed, 1 Nov 2017 16:21:48 +0000 (17:21 +0100)]
Fix Ekin at step 0 with COM removal
The kinetic energy at step 0 was computed from the velocities without
the center of mass velocity removed. This could cause a relatively
large jump in kinetic energy, especially for small systems.
Now compute_globals is called twice with COM removal so we get
the correct kinetic energy.
Appropriate mdrun tests for energy-conserving integrators are also added.
Change-Id: I87ab08d21a35621735ab3c65fc50af9992120be3
David van der Spoel [Tue, 31 Oct 2017 12:25:56 +0000 (13:25 +0100)]
New mdp input for electric fields.
New format for MDP input for electric fields that is consistent
with the manual and that is comprehensible.
Change-Id: I5f9f434080f5217d2473c16377aee962692b9ee9
Aleksei Iupinov [Tue, 31 Oct 2017 22:51:19 +0000 (23:51 +0100)]
Replace math.h by cmath includes in cpp files
Partially fixes #2285 (for non-GPU build)
Change-Id: I638a0b8ba5e4e04e00730b01640ac7c6a41834ed
Mark Abraham [Thu, 2 Nov 2017 09:43:11 +0000 (10:43 +0100)]
Merge branch release-2016
Ensured fix for gmx compare cmp_atoms went to the right code.
Change-Id: Iabc8ec03e7ebc45517f63697c3e7dea12b3f5398
Berk Hess [Thu, 2 Nov 2017 08:42:39 +0000 (09:42 +0100)]
Add missing Ewald correction for pme-user
With coulomb-type = pme-user, the Ewald mesh energy was not subtracted
leading to (very) incorrect Coulomb energies and forces.
Fixes #2286
Change-Id: Idfef9896d484e254264150e718c5516a832a2ad4
Paul Bauer [Mon, 30 Oct 2017 14:40:16 +0000 (15:40 +0100)]
Small change to LaTeX manual generation
Removed the gmxlite if statements in the pdf manual source files. They
made it more difficult to generate the new markup style files and are
apparently not needed.
Change-Id: Ica401f103c8f9682c7a45bdd90aa8680db7ff56a
Mark Abraham [Mon, 30 Oct 2017 17:13:07 +0000 (18:13 +0100)]
Fix thread-MPI rank choice for orientation restraints
Only a single rank is supported, so that must be what the thread-MPI
code will choose. There's another check later on that catches the
multi-rank MPI case.
Change-Id: I9ccf5fbe958fc0c004a89ebc92a352460e9cba1f
Aleksei Iupinov [Wed, 1 Nov 2017 11:35:53 +0000 (12:35 +0100)]
Remove unused PME GPU declarations
Change-Id: If64bcf73e825f6cd5ba48345f931c9dd25241046
Aleksei Iupinov [Wed, 1 Nov 2017 11:31:52 +0000 (12:31 +0100)]
Move pme_gpu_finish_computation() documentation to the declaration
Change-Id: I4970424eb5108e51c6e8b00b55a60854900e16b9
Paul Bauer [Wed, 1 Nov 2017 11:44:51 +0000 (12:44 +0100)]
Fixing missing references in web documentation
Change-Id: Ifca209c15f4cec3fed24e2070df8fa85320d02dd
Aleksei Iupinov [Tue, 31 Oct 2017 16:15:51 +0000 (17:15 +0100)]
Fix erroneous PME GPU "step" namings
Previous PME GPU code/documentation assumed single PME computation
per MD step, while there can actually be several. This change
replaces erroneous "step" names in the PME GPU module with
"(PME) computation" and similar.
Change-Id: Id230e848e0db0648a429bfc35a59106d1db1f7c9
Mark Abraham [Wed, 25 Oct 2017 10:08:01 +0000 (12:08 +0200)]
Improve handling of GPU IDs
Shifted responsibility for handling parsing of mdrun -gpu_id to early
in the runner, rather than as part of the assignment process.
Moved utility string handling + tests to taskassignment module, since
they only supported this process. Updated string handling in gmx
tune_pme to use more std::string and use the new
functionality. makeGpuIds will be used to replace the code in
assign_rank_gpu_ids in a subsequent patch.
Change-Id: I8d39cc69d0f96ac395858ed7cbe9f2947081b384
Aleksei Iupinov [Fri, 27 Oct 2017 13:37:47 +0000 (15:37 +0200)]
Simplify PME GPU synchronization code
Most synchronization events are removed; synchronization is mostly
done by a single stream synchronization call at the end of the step.
Change-Id: Ia793f2623d81ae8e3f6dfb5c84a6a636e422d982
Aleksei Iupinov [Tue, 31 Oct 2017 15:42:25 +0000 (16:42 +0100)]
Reuse epbcXY logic
Change-Id: I9ec7521b050521932b64b2b08a58c7b530975fb0
Szilárd Páll [Tue, 31 Oct 2017 14:11:38 +0000 (15:11 +0100)]
Fix nstlist increase warning print
The log file warning print had a buggy conditional which this commit
fixes.
Change-Id: Ic106fa3fba54b2c394818e3a642f462d2675a2b1
Szilárd Páll [Mon, 16 Oct 2017 15:40:23 +0000 (17:40 +0200)]
Check CUDA available/compiled code compatibility
Added an early check to detect when the gmx binary does not embed code
compatible with the GPU device it tries to use nor does it have PTX that
could have been JIT-ed.
Additionally, if the user manually sets GMX_CUDA_TARGET_COMPUTE=20 and
no later SM or COMPUTE but runs on >2.0 hardware, we'd be executing
JIT-ed Fermi kernels with incorrect host-side code assumptions
(e.g amount of shared memory allocated or texture type).
This change also prevents such cases.
Fixes #2273
Change-Id: I5472b1a33e584a75f451e21e9fd25992633fbea9
Mark Abraham [Wed, 25 Oct 2017 09:45:32 +0000 (11:45 +0200)]
Update treatment of GPU compatibility data structure
Now we only construct the vector of compatible GPUs once per mdrun,
and are less coupled to hw_info and gpu_info structs.
Change-Id: I181f0486d0ea1670de7a85046c94c1fef83dce17
Szilárd Páll [Tue, 31 Oct 2017 14:18:27 +0000 (15:18 +0100)]
Fix nstlist increase warning print
The log file warning print had a buggy conditional which this commit
fixes.
NOTE: skip when merging, upstream fix submitted separately.
Change-Id: Id85223a3f762bbab26525a60987870d77cd5a01c
David van der Spoel [Mon, 30 Oct 2017 08:03:13 +0000 (09:03 +0100)]
Fixed mdp output from electric field code.
Added two new tests for MDP output.
Fixes #2258
Change-Id: I495454bd2349be836c1a3ef5985288a996abf20e
Aleksei Iupinov [Mon, 30 Oct 2017 10:56:17 +0000 (11:56 +0100)]
Fix reference mode build unused function warnings
Change-Id: Ibd1ad83c5dbeffe86e47156d456d78ab1ab8aeeb
Berk Hess [Sun, 29 Oct 2017 21:20:54 +0000 (22:20 +0100)]
Remove unused sign parameter from dih_angle()
Change-Id: I88a73ca49b6acfc59b4baf0d847aa81542a870ca
Roland Schulz [Fri, 13 Oct 2017 18:49:46 +0000 (11:49 -0700)]
ArrayRef: Replace fromVector with subArray
Creating ArrayRef from iterators is potentially dangerous,
because it is incorrect for non-contiguous containers.
arrayRefFromVector(v.begin()+start, v.begin()+start+length)
is replaced with
ArrayRef<T>(v).subArray(start, length)
Also:
- Combine all conversion constructors
Removes code duplication and makes conversion more powerful
(e.g. base pointer or containers with allocators).
- remove fromPointers and arrayRefFromPointers
Wasn't used by any code
- remove fromArray and replace wih arrayRefFromArray
Change-Id: I05ad6b285ece58739d9f5bce48f9ecf4ade3454e
David van der Spoel [Fri, 13 Oct 2017 16:36:27 +0000 (18:36 +0200)]
Added option -water tips3p to pdb2gmx.
Fixes #2272
Change-Id: Ibfc63009767fd667df51ff10041791268351e1ca
Aleksei Iupinov [Fri, 27 Oct 2017 11:01:19 +0000 (13:01 +0200)]
Bring PME GPU/CUDA internal structure names to CamelCase
This only does mechanical renaming (e.g. pme_gpu_settings_t to
PmeGpuSettings). Any meaningful renames will be done separately.
Change-Id: I7ea2af94fd0212ff6edcf433ff21842c5bbb67b0
Mark Abraham [Tue, 24 Oct 2017 19:59:45 +0000 (21:59 +0200)]
Fix and update hw_info
Stopped using typedef struct (so later we can put a vector into the
struct).
Managed the memory using a unique_ptr, and made the interface reflect
that it is a file static, rather than something that is owned by
e.g. the runner.
Amended docs to clarify the sense of "global."
Change-Id: I1ce9bc42e03668498051b59aaeeb9e50a9f6f762
Aleksei Iupinov [Fri, 27 Oct 2017 13:09:04 +0000 (15:09 +0200)]
Use new/delete for gmx_pme_t
Change-Id: I176b1d26d484514c65cae412c474b65410191d38
Aleksei Iupinov [Thu, 26 Oct 2017 15:04:34 +0000 (17:04 +0200)]
Simplify PME data handling in runner
Differing ownership of the PME data for PME-only and other ranks
is now hidden behind a reference. gmx_pme_init() now returns
a pointer to the allocated structure.
Change-Id: Ia9c5117a0db43a6564298dd621cf9254f0423acf
Aleksei Iupinov [Thu, 26 Oct 2017 14:48:06 +0000 (16:48 +0200)]
Make PME tuning logic more readable
Change-Id: Ie53693a84264ed33c17894aa551cf476a3ced26b
Berk Hess [Sun, 29 Oct 2017 21:12:20 +0000 (22:12 +0100)]
Remove incorrect comment for CHARMM tips3p
Change-Id: I383e28a7b75aa3654a65d15358820a28f9163308
Aleksei Iupinov [Thu, 26 Oct 2017 11:25:23 +0000 (13:25 +0200)]
Remove unused PME grid dump debug functions
Change-Id: Iac748080fdf29e6f35ecf37de2b968e70c72605e
Mark Abraham [Thu, 26 Oct 2017 08:36:13 +0000 (10:36 +0200)]
Fix hw detection more
gmx_hardware_detect was called in response to GoogleTest environment
SetUp function, so the cleanup for its global should occur in response
to the corresponding TearDown function. Both those should be virtual.
Thus the hardwareInfo should not be in a smart pointer called by a
destructor that might be called at a different point from TearDown.
The new getter function and the callback that handles making the first
call to it conform better to GoogleTest's recommendation to arrange to
call AddGlobalTestEnvironment from main() rather than rely on static
initialization.
Made hardwareInit a non-member function because that improves
encapsulation.
Change-Id: I2f8e14ecc1707bf31d023a4eb4fea0a20543910b
Aleksei Iupinov [Thu, 26 Oct 2017 08:36:48 +0000 (10:36 +0200)]
Replace a few asserts with GMX_ASSERT's
Change-Id: I18e614de57fc06f3faabc687140821223bd7c4f4
Aleksei Iupinov [Thu, 26 Oct 2017 11:49:22 +0000 (13:49 +0200)]
Remove defunct PME initialization error code return
The error was never actually returned, and invalid inputs
are already treated with exceptions anyway.
Change-Id: I6063612c3a2e760fb56b7bdf5b1624ab2fc031bd
Mark Abraham [Mon, 9 Oct 2017 11:50:25 +0000 (13:50 +0200)]
Make release matrix work again
Seems we didn't test this matrix when we updated infrastructure some
time.
Change-Id: Ib19672db6144bb40f08d2fcace4d43dbd52e6823
Szilárd Páll [Mon, 16 Oct 2017 18:15:25 +0000 (20:15 +0200)]
Reorganize PME GPU launch
Wrapped the first (prep/spread) and second stage (fft/gather) of PME GPU
in functions. Moved the second stage of the regular PME GPU mode to after
the nonbonded x transform to ensure that the transform can overlap with
spread even when the launch overhead of the FFT kernels is high.
Also removed TPI-related PME-GPU launch conditions as this should be
checked much earlier. Noted in the force flags docs that the current
code assumes GMX_FORCE_STATECHANGED is used only with TPI.
Change-Id: I7f765d66c6c4e7e54812b81b2dd23751af0b06b5
Mark Abraham [Wed, 18 Oct 2017 21:02:44 +0000 (23:02 +0200)]
New quote
Change-Id: Id1625b1c836c64a5bd1e24fbf5b3ef2b104f102d
Mark Abraham [Tue, 24 Oct 2017 16:03:24 +0000 (18:03 +0200)]
Teach the copyright checker about template.cpp
Now it won't warn about having to ignore it when calling
uncrustify via the scripts.
Change-Id: I2d5675a1a16dc01f6f9a45440f6807319c766944
Berk Hess [Tue, 24 Oct 2017 19:01:09 +0000 (21:01 +0200)]
Fix gmx check for tprs with different #atoms
Fixes #2279.
Change-Id: I0a56cb30922ba2831bd6177ca6025e15a25dbed6
Aleksei Iupinov [Tue, 24 Oct 2017 14:24:11 +0000 (16:24 +0200)]
Remove duplicate/outdated function declaration
Change-Id: Ie54afcc501e0658c8a29a80831ff87765e5e7786
Aleksei Iupinov [Tue, 17 Oct 2017 11:15:24 +0000 (13:15 +0200)]
Move commrec duty checking into simple getters
This isolates all reads of cr->duty, and asserts on cr->duty being valid,
allowing to refactor its assignment in later changes.
Change-Id: I9b48be06b8d2db18105619ea1acfe38aa541b622
Aleksei Iupinov [Mon, 23 Oct 2017 09:14:19 +0000 (11:14 +0200)]
Correct the page allocator description
Change-Id: Iea0190978b483ed01fc8279f34ef2a304a11b612
Szilárd Páll [Fri, 20 Oct 2017 14:37:49 +0000 (16:37 +0200)]
Eliminate some OCL/CUDA code code duplication
Atom to interaction locality conversion and atom range calculation has
been duplicated across the OpenCL and CUDA modules. As an intermediate
step this functionality is now gathered in the common header.
Change-Id: I55b1b34992621ecebed6dad0978a47553511fc87
Berk Hess [Tue, 10 Oct 2017 07:23:30 +0000 (09:23 +0200)]
Fix incorrect dV/dlambda for walls
The free-energy derivative dV/dlambda for walls, which can
be perturbed by changing atom types of non-wall atoms, only
contained the B-state contribution.
Fixes #2267
Change-Id: I7c6d1b57d1e0e173e1461d55855df45c489e082a
David van der Spoel [Tue, 23 May 2017 11:20:30 +0000 (13:20 +0200)]
Split off the NMR related analyses from gmx energy.
A new tool gmx nmr is created by straight copying code from
gmx energy to a new tool. The reason is to reduce complexity.
A few cleanups are introduced to pass the valgrind memory
test.
Added references the gmx nmr in the manual.
Change-Id: I8e4d1dec8806a0518c571d7a01c4f70de5bbbd35
Aleksei Iupinov [Fri, 20 Oct 2017 12:53:01 +0000 (14:53 +0200)]
Change PME GPU gather reduction argument from boolean to enum class
Change-Id: Idacbdfb79313ebf16cf1a7dc19435436d6366d27
Berk Hess [Sat, 21 Oct 2017 19:33:39 +0000 (21:33 +0200)]
Merge "Merge branch release-2016"
Aleksei Iupinov [Tue, 30 May 2017 14:00:35 +0000 (16:00 +0200)]
Add calls to the PME GPU stages
This adds the inactive calls to PME GPU stages both for PP+PME
and PME-only ranks.
Ref #2054
Change-Id: I5af2ab95cedff422c39592255f01205d42fc7eb7
Mark Abraham [Fri, 20 Oct 2017 13:06:58 +0000 (15:06 +0200)]
Clarify docs for Fmax in EM
Change-Id: I388c653b3e277289fa1b1ad0ae9f2679679b9cb8
Mark Abraham [Fri, 20 Oct 2017 10:05:05 +0000 (12:05 +0200)]
Merge branch release-2016
Dropped the post-submit change because where we test
with older clang we now always specify a suitable
status for openmp.
Change-Id: I993da20856861c0b8a0888f7fa0deed8853349a8
Berk Hess [Wed, 18 Oct 2017 18:18:31 +0000 (20:18 +0200)]
Fix warning for confout with periodic molecules
With periodic molecules, mdrun would, incorrectly, attempt to make
molecules whole for writing the final state to confout.
Fixes #2275
Change-Id: Ib19ca5c2ae6fcca6126773bcdd8a05c8e141c3ce
Szilárd Páll [Thu, 19 Oct 2017 16:53:04 +0000 (18:53 +0200)]
Disable OpenMP with clang 3.4 post-submit config
This avoid a CMake warning which we now parse and catch in jenkins.
Fixes #2277
Change-Id: Id129f9907af32bdecfe07c4ca37d4cb7376d79e2
Szilárd Páll [Tue, 17 Oct 2017 18:29:58 +0000 (20:29 +0200)]
Remove unised OpenCL debugging helpers
This also helps avoid -Wmissing-declarations in OpenCL utils module.
Change-Id: I16584fca485790e98fd3865ac65c06ac78d58194
Berk Hess [Mon, 16 Oct 2017 07:24:05 +0000 (09:24 +0200)]
Made CUDA PME texture reference conditional
Without texture support we should not reference textures.
Also added const struct to textures.
Change-Id: I1ca4e534da7b9130d12fd6831c119d2139eb16eb
David van der Spoel [Tue, 17 Oct 2017 20:26:21 +0000 (22:26 +0200)]
Fixed missing entries in nrnb arrays.
Some nrnb index entries were missing in the interaction_function
array, others were zero leading to that the wrong megaflops
accounting was printed.
Fixes #2274
Change-Id: Ic0b05d30eb5fdfeb7f3e822b42ec7ca4cda58bc5
Mark Abraham [Mon, 16 Oct 2017 06:10:34 +0000 (08:10 +0200)]
Merge branch release-2016
Change-Id: Ia56e987f52e4dee425b12b02940ad9ca18d0c13a
Berk Hess [Sun, 24 Sep 2017 20:27:02 +0000 (22:27 +0200)]
Improve vsite parallel checking
The vsite struct now stores internally whether it has been configured
with domain decomposition. This allows for internal checks on valid
commrec, which have now been added.
The vsite constructor now initializes to atom range to invalid values,
so we can check that the thread splitting has been called before
constructing. This would have caught bug #2257.
Removed the vsite struct from the global construct function argument
list, which simplifies the vsite code in several places and
fixes #2257.
Also some general clean-up: removed some snews, added some camelCasing
and doxygen documentation.
More renaming would be beneficial, but should be a separate commit.
Change-Id: I467ec8b8ebfa0da090d4ac0a1d096ad9fab87eb5
Aleksei Iupinov [Fri, 13 Oct 2017 09:50:19 +0000 (11:50 +0200)]
Relax PME spline computation tolerance in double precision tests
Change-Id: I8c3502dd84e21d20be057d47d4afa589d779eb90
Mark Abraham [Thu, 9 Feb 2017 10:49:38 +0000 (11:49 +0100)]
Update tests for C++11 compiler and standard library
We've started using some more features, so broaden the
range of things for which we check at cmake time.
Also made an explicit error message for older icc that can't handle
newer gcc standard libraries, since this might come up a few times.
Fixes #2116
Change-Id: I3656edb3f7e6f81bbf6ed3ed764bcac56802f87f
Roland Schulz [Wed, 4 Oct 2017 06:48:49 +0000 (23:48 -0700)]
Replace all ConstArrayRef with ArrayRef<const T>
1) Remove the alias itself in arrayref.h.
2) All replacements done automatically using sed:
s#ConstArrayRef<const char \*>#ArrayRef<const char *const>#
s#ConstArrayRef<\(.*\)>#ArrayRef<const \1>#
This worked because "const char*" was the only pointer type used as
template argument.
Change-Id: I5eba895a5dc235b95d77670b4f258e423f64f3b8
Roland Schulz [Fri, 22 Sep 2017 20:43:50 +0000 (13:43 -0700)]
Specialize ArrayRef for SimdReal
ArrayRef<SimdReal> maps to a range of aligned memory and returns a
Simd type from operator[] (more precisely a reference to a Simd type).
This allows to iterate over memory and not have to explicitly call
load/store while also avoiding undefined behavior (strict aliasing rule)
caused by casting between reals and SimdReals.
Change-Id: I3d00df088669dacc810052cbcaebe15e62e1d530
Magnus Lundborg [Tue, 10 Oct 2017 12:13:45 +0000 (14:13 +0200)]
Do not include headers related to ObservablesHistory
Define destructor for ObservablesHistory to avoid having to include
many extra headers.
Change-Id: I2681b519ace728dc494f967d17db5478af09f5df
Mark Abraham [Tue, 10 Oct 2017 09:22:02 +0000 (09:22 +0000)]
Fix cpuinfo on clang + non-x86
Compilers that pretend to be GCC often define such symbols, and the
support for inline assembly does not compile e.g. on ARM. This broke
CPU detection at cmake time, and subsequent compilation. Probably
introduced by commit
863768a4dad. The latest ARM compiler is based on
clang, so we should fix this.
Also de-duplicated some use of compiler target defines
Change-Id: Ia21363b9c0fe112762750d93b9feea267a34319f
Szilárd Páll [Thu, 12 Oct 2017 15:52:15 +0000 (17:52 +0200)]
Remove the size_t from the PME gather CUDA kenels
Change-Id: If53b9eabc1ac081b33933cc773b5ea932c9e8392
Aleksei Iupinov [Thu, 12 Oct 2017 11:00:59 +0000 (13:00 +0200)]
Remove useless extern CUDA texture reference declarations from PME
These are only accessed from the same compilation unit (pme-spread.cu)
on the device side, and the host side is only using nearby getters.
Change-Id: Ie846193c71142ff5e519e990ef1155b534546a9b
Aleksei Iupinov [Thu, 12 Oct 2017 10:31:45 +0000 (12:31 +0200)]
Revert "Drop NB_ from GMX_CUDA_NB_SINGLE_COMPILATION_UNIT cmake define"
This reverts commit
3880255b0, which was made in confusion
stemming from combination of multiple CUDA compilation units,
disabling CUDA textures, and NB CUDA module structure.
The define in question is actually NB-exclusive,
and PME with CUDA does not need to check it to declare
extern texture references. As PME textures are not accessed
from different PME kernels, those extern declarations are removed
in the child change Ie846193c71142ff5e519e990ef1155b534546a9b.
Change-Id: I75a0e62bc92c7161ba0fbf00d8db2f35cef80bc7
Berk Hess [Sat, 30 Sep 2017 21:10:06 +0000 (23:10 +0200)]
Simplify virial handling
The force and virial are tightly connected. This is now expressed
through the new ForceWithVirial object, which is used for algorithms
that compute a separate virial contribution. This clarifies and
simplifies the core mdrun code in several places.
Change-Id: If0f65f1a6f67fb3efc5e4637a183faf4abd5f969
Roland Schulz [Fri, 6 Oct 2017 23:36:50 +0000 (16:36 -0700)]
Require template parameter for load function
The implicit conversion from load(float*) to both float
and SimdFloat caused multiple issues. The primary ones:
- Extra complexity in the implementation of traits, ArrayRef, SimdReference
- required compiler tests for ambiguity
- SimdReal x = f(load(m)) //confusing broadcast if f is scalar function
- x = s*load(m) //error-prone scalar multiply if s is scalar
New syntax in templated function is load<T>(m) and in non-templated function
load<SimdReal>(m). While this is slightly longer by itself, it is clearer
and doesn't require to store values in tempories (no ambigious overload errors).
Also avoids the need for the load proxies.
Change-Id: I8109e9365e956aaea428ec338b6a810444e03d77
Roland Schulz [Sat, 7 Oct 2017 00:34:02 +0000 (17:34 -0700)]
Use tag for simdLoad
Use same simdLoad name for all types. In preparation
for removing the need for SimdLoadProxyInternal.
C++ doesn't support template specialization for
function thus making simdLoad have a template argument
and specialize on it doesn't work. By passing a tag as
a 2nd argument std overloading can be used.
Change-Id: Iaf42ebb74a3347787bcac3bdfd0ef11db1e333bf
Mark Abraham [Thu, 13 Apr 2017 23:31:46 +0000 (01:31 +0200)]
Introduced header for communication to/from PME ranks
No functionality changes. This cleans up some structure, and will be
useful for some modernization, use of std::vector, and then new
allocation strategies to suit PME on GPUs.
Eliminated some things in pme-internal.h by moving some declarations
to a header that can be included by the only two source files that are
interested in PP-PME communication. Now gmx_pmeonly() doesn't have to
pass around a large pile of arguments.
Removed a use of typedef struct, and some function parameter types
that no longer need to specify struct in C++.
Removed some unused PP_PME_* constants.
Change-Id: I51629fb6d91b3a486ef24d1f60065e65261d0376
Aleksei Iupinov [Wed, 11 Oct 2017 16:16:07 +0000 (18:16 +0200)]
Fix clang warnings for PME CUDA kernels
Change-Id: I28f67c70b1ff4611f2456a5935a727c49e10e691
Aleksei Iupinov [Wed, 11 Oct 2017 16:28:24 +0000 (18:28 +0200)]
Relax PME solving test complex grid tolerance
PME CUDA solving change (Ic610e7f) tightened the output grid
tolerance from 50 down to 16 ULPs, making one of the LJPME tests
fail in post-submit. This change relaxes the tolerance to 40 ULPs.
Change-Id: Icd0c1aff868e2d1ecb76522a1a2174b3156fc356
Mark Abraham [Fri, 14 Apr 2017 02:23:44 +0000 (04:23 +0200)]
Cleaned up ewaldcoeff for PME-only ranks
Earlier, runner initializes all kinds of PME ranks with the initial
values of Ewald coefficients. The values passed to gmx_pmeonly were
never read - the variables are used only to store new values, which
happens when the PP rank directs the PME grid to switch grids during
load balancing.
Change-Id: Ibe581a7111239f28f874b43dc13dcc6abd025b60
Aleksei Iupinov [Fri, 25 Aug 2017 17:16:13 +0000 (19:16 +0200)]
CUDA 9/Volta support for PME
Change-Id: Icd5cdf16f9118347179dfcbdd162f0cb39cbdd69
Aleksei Iupinov [Tue, 7 Feb 2017 14:01:54 +0000 (15:01 +0100)]
PME solving - CUDA kernel + unit tests
The CUDA implementation of PME solving is added in pme-solve.cu.
The unit tests for PME CPU solving are extended to work with the CUDA kernel,
using the same reference data.
The CUDA solver supports 2 grid dimension orders: YZX and XYZ
(unlike the CPU one which only supports YZX). This is also tested.
Lennard-Jones solving is not implemented.
The tests iterate over all Gromacs-compatible CUDA GPUs.
Refs #2054
Change-Id: Ic610e7f077f39a64089dd9b80df9905094b10459
Paul Bauer [Mon, 9 Oct 2017 07:48:51 +0000 (09:48 +0200)]
Change to modules for build of web documentation
The modules loaded to build the web documentation with Sphinx have been
incorrect for the minimum version specified by the configuration file.
In particular, the imgmath extension had not been available for
version 1.3 that was indicated as being the minimum version. As
there are no references that I found to any math macros in the files
used to build the docs, I removed the extension to make sure it will
build again. It might be better to have a conditional there, building the
docs without imgmath when using lower versions of sphinx, and having it
active for higher versions.
Changed to require 1.4.1 for now, and added variables that set it
automatically from the information passed to cmake.
Change-Id: Ia329575288e5d622b8e679d76b63759bae54a3b0
Aleksei Iupinov [Fri, 27 Jan 2017 14:49:55 +0000 (15:49 +0100)]
PME force gathering - CUDA kernel + unit tests
The CUDA implementation of PME force gathering for PME order 4 is added
in pme-gather.cu. The unit tests for PME CPU force gathering
(
d20a5d36) are extended to work with the CUDA kernel, using
the same reference data. The tests iterate over all Gromacs-compatible
CUDA GPUs.
Ref #2054
Change-Id: I162e3a14cb9aa8ddeac17c5ad1ca709df72b8986