Mark Abraham [Wed, 27 Apr 2016 15:51:24 +0000 (17:51 +0200)]
Merge branch release-5-1
Change-Id: I6fdd1b6186911b41d6089c70ee26eff1909fd6e5
Szilárd Páll [Sun, 3 Apr 2016 23:39:48 +0000 (01:39 +0200)]
Fix multiple MPI ranks per node with OpenCL
Similarly to the thread-MPI case, the source of the issue was
the hardware detection broadcasting the outcome of GPU detection
within a node. The MPI platform and device IDs, OpenCL internal
entities, differ across processes even if both platform and device(s)
are shared. This caused corruption at context creation on all ranks
other than the first rank in the node (which did the detection).
This change disables the GPU data broadcasting for OpenCL with MPI.
Fixes #1804
Change-Id: I90defdcb3515796c46ba89efb0ed1e3c8b1b35f9
Teemu Murtola [Thu, 21 Apr 2016 18:40:52 +0000 (21:40 +0300)]
Fix issues with failing thread affinity setting
- Fix the conditional in MPI_Reduce() to correctly detect only a subset
of the ranks failing.
- Ensure that all ranks reach all the MPI_Reduce() calls to avoid
deadlocks on heterogeneous nodes, where only some nodes could fail the
consistency checks. As a side effect, always produce the final
message about failed affinity settings, together with its advice.
- Only do MPI_Reduce() if there are multiple ranks.
- Fix incorrect #ifdef (caused by rebasing the original change over the
change that made GMX_MPI 0/1-valued).
The last change seems to fix #1951.
Change-Id: I93c8c4bd6051c9077736f9fc19e6e0637c6d6435
Teemu Murtola [Wed, 27 Apr 2016 12:15:04 +0000 (14:15 +0200)]
Merge "Merge branch release-5-1"
Vedran Miletić [Mon, 25 Apr 2016 14:05:21 +0000 (16:05 +0200)]
Disable CUDA profiler when using OpenCL
Replacing GPU_FUNC_TERM with CUDA_FUNC_TERM generates correct empty
implementations and therefore fixes linker errors.
Change-Id: I6485471eeb22bec9e6f0c3528bff7310593e3be6
Szilárd Páll [Tue, 26 Apr 2016 13:40:51 +0000 (15:40 +0200)]
Fix OpenCL assertion
An incorrect vdw type count checked would cause assertion failure for
some types of runs, but not for others.
Change-Id: Ia941f5a5daf37eeabca45fe4c078f0c6327a1d4f
Roland Schulz [Tue, 19 Apr 2016 16:03:49 +0000 (09:03 -0700)]
Fix numerically unstable Gaussian test
Change-Id: If3461ae29584e8afba6d1c8646bfb37aaeb16deb
Berk Hess [Mon, 11 Apr 2016 15:01:41 +0000 (17:01 +0200)]
Fix pull legend with external potential
A typo could make the legend for the pull reference appear in the
pullx output file, while there is (and should not be) reference
output.
Change-Id: I1855c5f3e6f16e1447add84bb18e3e1d36399cef
Roland Schulz [Sat, 16 Apr 2016 00:02:37 +0000 (17:02 -0700)]
Fix numerically unstable selection test
Change-Id: Iaea0b6bcbda63b3d8d889a935725a3ece640b9f9
Szilárd Páll [Tue, 12 Apr 2016 13:54:51 +0000 (15:54 +0200)]
Move out CUDA profiler triggers from NBNXN
The profiler triggering is a general functionality that should not be
tied to the nonbonded module. Hence, it is now moved into the gpu_utils
module and called directly at reset/cleanup.
Change-Id: Ifa862dbcbc6386c514dfcc1f6a5169ea6ae8d09f
Berk Hess [Fri, 27 Nov 2015 14:38:32 +0000 (15:38 +0100)]
Allow rcoulomb>rvdw with PME
We have Coulomb PME + LJ cut-off kernels that support rcoulomb>rvdw
or PME load balancing. Now we allow this setup to be chosen also
through mdp options. This is mainly a straightforward change, only
the Verlet buffer calculation needed to be adapted to support
different buffer lengths for LJ and Coulomb.
Change-Id: I9d8176656435fbdb4ec48b3dd2c7128d13c5ef07
Mark Abraham [Mon, 15 Feb 2016 10:21:38 +0000 (11:21 +0100)]
Replace libxml2 with tinyxml2 for use in test code
Building libxml2 on complex machines has been problematic for users
and developers, and there is no pressing reason to need the full
capabilities of libxml2 for reading and writing data for test
cases. Support for detecting libxml2 is retained, because other code
in Gerrit might plan to use it, but it currently is not needed for
anything.
This is version 3.0.0 of tinyxml2, modified only so that
XMLPrinter::PrintSpace will write XML files with nesting indentation
of 2 spaces rather than 4, to match existing XML files in the GROMACS
repo. Also, a cppcheck suppression was added, and fix proposed
upstream that helps avoid it.
The resulting XML is identical in format to the current form in the
repo, except that empty text fields are rendered with
'<string></string>' rather than using the '<string/>'
shorthand. Those have been regenerated.
One known limitation is that '\r' is not handled correctly in text
nodes (string or text block), as it is converted to '\n'. libxml2 did
not handle it correctly in text block either, and we don't need this
behaviour to work, so it is no longer tested for either case.
Another limitation is that a non-CDATA text element that is not empty
and contains only whitespace is written correctly by TinyXML2, but is
parsed to an empty element upon reading. So GROMACS cannot rely on
refdata that wants to write a whitespace-only non-empty String (a
TextBlock is OK), which we might want to do some time. Attempting to
do so throws an exception, so we can't inadvertently run into trouble.
Some of the helper code that reports how strings in refdata do not
match has been slightly improved, so that the string that doesn't
match is wrapped in single quotes, so that the manner in which
whitespace is not matched can be more easily seen on the terminal.
Refs #1947
Change-Id: I6153136b67b41e7141fc3f1fc8ee4005a72c90f1
Roland Schulz [Fri, 15 Apr 2016 20:16:37 +0000 (13:16 -0700)]
Tune nstlist increase for KNL
Change-Id: If4162c307fd9b31f7a76a18ee8a1fa45830559e1
Mark Abraham [Tue, 19 Apr 2016 00:02:13 +0000 (02:02 +0200)]
Fix parse_digits_from_string
New GPU-passing functionality wasn't checked for empty string like the
functionality that it extended is checked for. Noted TODO for future.
Change-Id: I587e526802a4e880fa966836b555fb07e2b3ad30
Berk Hess [Thu, 7 Apr 2016 17:26:52 +0000 (19:26 +0200)]
Broadcast pull external potential provider string
The pull external potential change was added without broadcasting
the string that tells what the external potential provider is.
Note that all other strings in inputrec are in symtab, but this one
is not.
Change-Id: I00959754bfe9ad96fe2c0b7682514f893df78f54
Mark Abraham [Thu, 14 Apr 2016 12:34:36 +0000 (14:34 +0200)]
Fix mdp label generation
The extra colons meant the rst didn't generate
the label that is expected.
Change-Id: Icbb5bb8aa674f043ae45e474ced8e6f5706589b4
Berk Hess [Tue, 5 Apr 2016 11:49:41 +0000 (13:49 +0200)]
Fix gmx hbond group overlap check
gmx hbond does not support partially overlapping analysis groups.
The check in the code was broken and never caught this, resulting
incorrect output that might OK at first sight.
Also corrected bitmasks = enums that (intentionally?) seemed to give
correct results by not using non power of 2 enum index entries.
Change-Id: Ic1643f1d552e35d35873885a3cdf49f19ec66ae3
Teemu Murtola [Wed, 16 Dec 2015 10:47:47 +0000 (12:47 +0200)]
Make thread affinity failures always end up in log
Remove calls to md_print_warn(NULL, fplog, ...), which were used for
cases where only some of the ranks could fail. But if non-master rank
failed, the error went only to stderr, not into the log file. Make this
work more uniformly such that the error always ends up in the log file.
The approach could possibly be generalized (it is now local to
threadaffinity.cpp, and only works for warnings where the text is the
same on each rank), but that is probably easier after the logging is
using C++.
Add some trailing newlines for consistent output from md_print_warn().
This also makes all md_print_warn/info calls use the same pattern, which
makes things easier to understand, and allows replacing them with a
simple object.
Related to #1083.
Change-Id: I03a3524ed883bed0c5b039836e9d1741c672d97d
Mark Abraham [Thu, 14 Apr 2016 08:33:03 +0000 (10:33 +0200)]
Reduce compilation coupling of SIMD module
Analysis tools that call bonded functions should not have to include a
header that unnecessarily includes the SIMD module.
Change-Id: I116429b4fc9f6e0a67735d58ac8ddd4aa9b78a1e
Roland Schulz [Fri, 15 Apr 2016 18:43:09 +0000 (11:43 -0700)]
Fix AVX512 suggestion
Change-Id: Id8a9d4b9c9886e4a023f3d8a9a035dc324029683
Roland Schulz [Thu, 8 Oct 2015 21:03:52 +0000 (17:03 -0400)]
Remove idioms deprecated in C++11
- use noexcept instead of throw()
- don't rely on default copy/assign if either one or the destructor is
explicitly defined
Enforced with clang -Wdeprecated
Change-Id: I4d2f32d6c3e880092fb27d81834c458f788a9e3f
Berk Hess [Thu, 14 Apr 2016 21:01:07 +0000 (23:01 +0200)]
Add QMMM checks
QMMM only works with cutoff-scheme=group and dynamics. grompp and
mdrun now check for this.
Also fixed potential print of NULL string in init_gaussian.
Fixes #1940.
Change-Id: I8215e339070d811ba07d17d743061b18b665a33b
Mark Abraham [Thu, 14 Apr 2016 09:23:52 +0000 (11:23 +0200)]
Merge branch release-5-1
Several changes from release-5-1 have been fixed separately
in master, such changes are omitted here.
Change-Id: Ib03a2091ad6f76e785f7a6cc9ea91493d246da3b
Carsten Kutzner [Mon, 4 Apr 2016 11:39:15 +0000 (13:39 +0200)]
Corrected value of electric conversion factor f in the manual
- As suggested by Christopher Neale
- Made a new latex command for the value so that it can appear
consistently throughout all uses in the .tex manual
- changed the value to f = 138.935 458(9), which is value and
standard deviation calculated from NIST 2010 CODATA, as also
used by GROMACS in units.h (full value calculated from
numbers in units.h is f = 138.935 457 839 ... )
- If I calculated correctly, the pre-June-2015 value of the std-dev
was correct, just the last two digits of the value itself
were swapped in some places 58(9) <-> 85(9)
- I believe commit
1ee3b15530a5 accidentally overwrote the std-dev
with digits from the value itself
Change-Id: I293768b657a81af0a4dc0865db8c4a1b0eebd4ad
Szilárd Páll [Wed, 13 Apr 2016 19:08:03 +0000 (21:08 +0200)]
Spelling fixes
Thanks to: Nicholas Breen / DebiChem
https://anonscm.debian.org/viewvc/debichem?view=revision&revision=6293
Change-Id: I5bfecc4cc623276b65cd2dd6e455eff697d4a227
Szilárd Páll [Wed, 13 Apr 2016 16:51:00 +0000 (18:51 +0200)]
Do not create empty man/man7 directory
Thanks to: Nicholas Breen / DebiChem
Change-Id: I42d0bf14ea774c623ccf9af687ac54bbe634da4c
Roland Schulz [Wed, 9 Mar 2016 02:21:35 +0000 (18:21 -0800)]
AVX512 transposeScatterIncr/DecrU with load/store
Also remove one more mask
Change-Id: I0d27eb42eead92d2f50725b2f3831af7e9ee229e
Erik Lindahl [Sun, 3 Apr 2016 12:20:07 +0000 (14:20 +0200)]
Fix hwloc narrowing warning on 32-bit architectures
hwloc always uses a 64-bit integer for cache size, while
size_t might be 32 bits on a 32-bit architecture. Since we
do not expect any 32-bit architectures to come with more
than 4GB of cache, we simply cast it explicitly to size_t.
Change-Id: I35b064de60a6b20dcf9ab6af751b4721037e271c
Szilárd Páll [Mon, 4 Apr 2016 19:17:03 +0000 (21:17 +0200)]
Mark more variables as advanced
CMake 3.x introduced a few more non-advanced CUDA cache variables that
most users do not care about, so this commit hides them.
Refs #1764
Change-Id: I926b03ece080d8f8255e37dc5471275d48956d49
Erik Lindahl [Sun, 3 Apr 2016 18:43:54 +0000 (20:43 +0200)]
Fix compile error for IBM VMX on ppc64
gcc-5.1 does not provide vec_mul(), only vec_madd().
This has already been fixed in the master branch with
the new implementation.
Fixes #1812.
Change-Id: I02c691b16bc0827cfa799aa99fd4fc7b84045e3f
Erik Lindahl [Sun, 3 Apr 2016 19:52:08 +0000 (21:52 +0200)]
Fix SIMD detection on new AMD AVX CPUs w/o fma
We earlier assumed that all AMD CPUs had fma
support and could use AVX_128_FMA, but with this
change we properly detect it and revert to
AVX_256 otherwise.
Fixes #1906 for release-5.1.
Change-Id: I803a6bfa75c5069b73023688a477cfca03c1ddcd
Berk Hess [Thu, 17 Mar 2016 21:18:01 +0000 (22:18 +0100)]
Correct nrdf for 1D/2D systems
With COMM removal, grompp would subtract degrees of freedom also for
VCM groups with fully frozen dimensions, i.e. 1D/2D systems.
Also fixed division by zero for groups with #DOF=0 with VV.
Fixes #1923.
Change-Id: I0ba2535df0495947d9bbb6ee1ef29f519635c221
Erik Lindahl [Sun, 3 Apr 2016 15:12:38 +0000 (17:12 +0200)]
Fix unit test segfaults with gcc-6.0
Work around a bug in gmock-1.7 by adding the flag
-fno-delete-null-pointer-checks to for the gmock
and testutil sources.
Fixes #1911.
Change-Id: I3e2d3fde99898191437d9eed986659f6f6fb056a
Erik Lindahl [Sun, 3 Apr 2016 12:03:14 +0000 (14:03 +0200)]
Fix incorrect return type for scalar version of simd andNot
The return type for the integer version was incorrectly set
to float. Fixed by properly adding separate integer, float
and double versions. This has never been used in any higher
level code yet, so it won't have caused any problems.
Refs #1930.
Change-Id: I84a4dfe51cf898664631ec2b0313ecf8e11a4a3a
Semen Yesylevskyy [Fri, 1 Apr 2016 12:43:14 +0000 (15:43 +0300)]
Added references to ProtSqueeze and g_membed to the user guide
Change-Id: Ia05a3710eb170173bae9257993f05d2377c02db6
Roland Schulz [Fri, 1 Apr 2016 20:13:37 +0000 (13:13 -0700)]
Fix coolquotes
The unseeded generator produced always the same number.
Change-Id: I7ce678e61b75aa789e7535bfedb0021c1f2117e7
Mark Abraham [Fri, 1 Apr 2016 10:30:32 +0000 (12:30 +0200)]
Revert to treating -1 as "generate a seed" in .mdp
Commit
2d0247f6 changed the grompp behaviour while refactoring the rng
seed setup, without mentioning it in the docs or commit message.
Several of the tools also had their functionality changed, but in most
cases there was no previous documentation and now there is
some. However, we can't change grompp behaviour without catering for
all the scripts whose behaviour would change.
This change restores the previous functionality for grompp.
Change-Id: Id77098e1f03ebf96207f8c9908b2dada28c59e6a
Berk Hess [Thu, 31 Mar 2016 21:57:08 +0000 (23:57 +0200)]
Fix includes in readpull.cpp
Change-Id: Iaf055fc196a17dd59795c6579caa0afc210a0c82
Berk Hess [Wed, 23 Mar 2016 13:57:40 +0000 (14:57 +0100)]
Add option for external pull potential
An module other than the pull code can now provide the potential
and force for a pull coordinate, when the pull coordinate type is set
to external-potential. It is the responsibility of the module to call
a function that applies the scalar force to the atom in the groups
involved in th pull coordinate. A registering function has to be
called by the provider module at setup. This function checks the
consistency of the pull setup and checks a provider string the user
provided against the string of the providing module.
This functionality is intended to be used by the AWH module.
Removed the set_pull_coord_reference_value() function. Instead of
externally modifying the reference value, an external pull potential
should be used.
Change-Id: I231fe43fe3a2148a7a77d979b0fdb234d3de5c9b
Mark Abraham [Mon, 21 Mar 2016 06:05:40 +0000 (07:05 +0100)]
Add test and improved docs for custom checkSequence
I tried to use this, but ran into problems if I passed a non-unique id
that was not NULL to the compound checker. Passing NULL is the
behaviour for the implementation of checkSequenceArray, and that seems
to be often appropriate here, too.
Change-Id: I9b02e4632085ad995ef67bbb52959f1533d56c34
James W. Barnett [Wed, 9 Mar 2016 18:28:15 +0000 (12:28 -0600)]
Revert behavior for handling SETTLE errors
SETTLE errors were changed to fatal in 5.1.2. This causes problems with
minimization of some systems, since a SETTLE error is likely to occur at the
beginning of the minimization. This commit reverts that such that
it only issues a warning as before.
Fixes #1915.
Change-Id: I43ea1878cb61aecf5ae2c5ebcc67fcbda27ba50f
Szilárd Páll [Wed, 9 Mar 2016 13:35:05 +0000 (14:35 +0100)]
refactor CUDA texture-based table initialization
Also makes conditional the allocation of non-bonded parameter table and
related CUDA texture, skipping it when combination rule kernels are
used.
Change-Id: I6f7e25ff3906d8f92da22d2a227292bcb9e9fc74
Szilard Pall [Sat, 20 Jun 2015 23:23:44 +0000 (01:23 +0200)]
avoid CPU oversubscription with GPUs
If the user specifies the number of OpenMP threads on the command line N
such that N*#GPU > #hardware threads, mdrun will determine the number of
thread-MPI ranks to start based on the number of GPUs without taking
into account the total number of threads that such a run would use. This
will result in CPU oversubscription.
This change will reduce the number of tMPI threads started to prevent
oversubscription.
Fixes #1338
Change-Id: I6bceaa2230686fee2e410caf8ae97c43847a5c28
Szilárd Páll [Fri, 18 Mar 2016 01:12:18 +0000 (02:12 +0100)]
Fix multiple tMPI ranks per OpenCL device
The OpenCL context and program objects were stored in the gpu_info
struct which was assumed to be a constant per compute host and therefore
shared across the tMPI ranks. Hence, gpu_info was initialized once
and a single pointer pointing to the data used by all ranks.
This led to the OpenCL context and program objects of different ranks
sharing a single device get overwritten/corrupted by one another.
Notes:
- MPI still segfaults in clCreateContext() with multiple ranks per node
both with and without GPU sharing, so no changes on that front.
- The AMD OpenCL runtime overhead with all hw threads used is quite
significant; as a short-term solution we should consider avoiding
using HT by launching less threads (and/or warning the user).
Refs #1804
Change-Id: I7c6c53a3e6a049ce727ae65ddf0978f436c04579
Szilárd Páll [Tue, 1 Mar 2016 20:30:18 +0000 (21:30 +0100)]
LJ combination rule kernels for OpenCL
The current implementation enables combination rules for both AMD and
NVIDIA OpenCL (also ports the changes to the "nowarp" test/CPU kernel).
Like in the CUDA implementation, all kernels support it, but only for
plain cut-off are combination rules used.
Notes:
- On AMD tested on Hawaii, Fiji, Spectre and Oland devices;
combination rules in all cases improve performance, although combined
with the i-prefetching, the improvement is typically only ~10%.
- On NVIDIA tested on Kepler and Maxwell; in most cases the combination
rule kernels are fastest.
However, with certain inputs these kernels are 25% slower on Maxwell
(e.g. pure water box, cut-off LJ, pot shift), but not on Kepler.
This is likely a compiler mis-optimization, so we'll just leave the
defaults the same as AMD.
Change-Id: I05396e000cdf93c1d872729e6b477192af152495
Szilárd Páll [Fri, 11 Mar 2016 18:05:47 +0000 (19:05 +0100)]
Re-enable i-atom type local mem prefetch in OpenCL
For reasons unknown this has been disabled in the original OpenCL
implementation. However, it turns out that prefetching does have
substantial performance benefits, especially on AMD (>10%) and in some
cases on NVIDIA too (although not on Maxwell).
This change re-enables prefetching code-path and turns it on
for AMD devices. For NVIDIA the decision will be revisited later.
The GMX_OCL_ENABLE_I_PREFETCH/GMX_OCL_DISABLE_I_PREFETCH environment
variables allow testing prefetching with future architectures/compilers.
Change-Id: I8324d62d3d78e0a1577dd3125edf059d3b311c2f
Berk Hess [Wed, 30 Mar 2016 09:34:02 +0000 (11:34 +0200)]
Correct mdrun tMPI (non-)parallel error message
Fixes #1931
Change-Id: Ifad46c7f62099a2cd80d70ccbe46bf3f2b5751e0
Teemu Murtola [Tue, 29 Mar 2016 19:08:49 +0000 (22:08 +0300)]
Make includesorter.py work with Doxygen 1.8.10+
Newer Doxygen versions write relative paths to the XML output, so make
the filtering that limits what to parse work with relative paths, using
XML entries that should give a relative path with all Doxygen versions.
Change-Id: I6172c625cd7e5293764b9a4dfeacf1c7de10ce78
Szilárd Páll [Tue, 29 Mar 2016 15:25:25 +0000 (17:25 +0200)]
Describe AMD APPSDK 3.0 issues in the guide
No better solution has been found to the issue, so we can only warn
users and provide advice for a workaround.
Fixes #1921
Change-Id: I4f6c8efa72e02d39ccb59977a2093802c0590563
Teemu Murtola [Tue, 29 Mar 2016 15:50:52 +0000 (18:50 +0300)]
Fix error reporting for documentation build
Make explicit file handling in the documentation build work, now that
the current working directory is managed separately. In some cases,
this could cause missing some warnings (probably mostly from
check-source) and marking a build incorrectly as successful. However,
the main effect was that reporting of the unsuccessful reason back to
Gerrit was less than ideal.
Fix issues that had been introduced while the checking was not working.
This can possibly be improved further by adding methods in releng for
wrapping all this file access, but that can wait for later, once there
is a bit of time to design a good, stable API for that.
Change-Id: I2f1dc927a2134365dba15a6b7dcec1a754b58433
Mark Abraham [Fri, 18 Mar 2016 22:53:16 +0000 (23:53 +0100)]
Clean up vsite code
No functionality changes. Added some const correctness. Reduced the
scope of a lot of variables.
Change-Id: Ia6caba6686ecf9348223932f7cc812e601d0109c
Teemu Murtola [Sat, 16 Jan 2016 05:09:13 +0000 (07:09 +0200)]
More support for release workflow
- Add a matrix used for the deployment tests for tarballs.
- Make gromacs.py support builds with static linking, make it
possible to run the tests (including regression tests) with 'make
check', and do an install with a normal + mdrun-only build.
- Adapt the website build script to releng changes, and do not run all
the Doxygen targets twice for the website build.
Some TODOs in gromacs.py identify what would still be necessary to get
to the level the Test_Tarballs_* jobs currently do, but that can also be
improved separately. Now this, together with the releng changes,
produces something useful reasonably robustly.
Change-Id: Idca8974c5c9e2ee20e04f0a309e0f306a9c615d6
Teemu Murtola [Sat, 26 Mar 2016 05:40:38 +0000 (06:40 +0100)]
Ignore empty GMXLIB
If the GMXLIB env.var. was set, but empty, it triggered an assert
(at least) in pdb2gmx. Now this case is silently handled as if
the variable wasn't set at all.
Part of #1928.
Change-Id: I51646ec659d119b20b7cb429f9292bffa5775aaa
Berk Hess [Thu, 11 Feb 2016 08:55:44 +0000 (09:55 +0100)]
Add real/SIMD template for LJ-14 pairs
Added a real/SIMD templated code-path for plain LJ 1-4 pair
interactions without free-energy, virial and energy.
We should add virial and energy, but the virial requires a different
treatment of the shift forces, that we should introduce at the same
time for angles and dihedrals.
The gives a few factors speed improvement. The main improvement comes
from simplified analytical LJ instead of tables; SIMD helps a bit.
Change-Id: Ifdefd875e32f6c5b7d609c3bd66a864fe77fc13e
David van der Spoel [Wed, 23 Mar 2016 13:04:00 +0000 (14:04 +0100)]
Removed srenew that was making array smaller incorrectly.
An array y was first allocated to 6 entries, then reduced
to 3 entries using srenew, but after that the indices
3, 4 and 5 were used anyway. Removing the srenew fixes the probem.
Fixes #1920
Change-Id: I3f35f0af3f56d33435bf54a9c7cda6273fb3f05a
Berk Hess [Wed, 16 Mar 2016 12:45:03 +0000 (13:45 +0100)]
Avoid assertion failure in pme_loadbal_do
When ir->init_step % ir->nstlist != 0 an assertion would fail in
pme_loadbal_do. If it does not fail every time, the assertion can safely
be replaced by a conditional return, which is now done in this case.
Fixes #1870.
Change-Id: Ie586022c87d3993f2de5a6fdf9210ae56aa7855b
Berk Hess [Fri, 26 Feb 2016 08:38:40 +0000 (09:38 +0100)]
Make gmx vanhove work without PBC
Change-Id: I37d59fc7edf6d6da36778ac3f342c71a876e2872
Carsten Kutzner [Tue, 1 Mar 2016 14:16:02 +0000 (15:16 +0100)]
Clean-up code for swapcoords.
Fixes one of the TODOs left from change gerrit.gromacs.org/#/c/5660/
Change-Id: If9f443447f2cfd7fb757a386447a51a9f958dd41
Berk Hess [Tue, 23 Feb 2016 18:51:48 +0000 (19:51 +0100)]
LJ combination rule kernels for CUDA
Implemented LJ combination rule parameter lookup in the CUDA kernels
and enabled it for plain LJ cut-off, as was already present in the
SIMD kernels.
Change-Id: Ic1222e9433eb21e8bd1fb81658ebbbfe42c1d2c2
Berk Hess [Fri, 18 Mar 2016 16:34:01 +0000 (17:34 +0100)]
Moved pull initialization back to mdrunner
Commit
29943fe5 moved the init_pull and init_rot calls from
mdrunner() to do_md(), but both are needed in energy minimization
as well and init_pull should be called before init_constraints().
Added an assertion to init_constraints() for pull initialization.
Fixes #1924.
Change-Id: I9420940f772afc2be93416619f464a6ef7472372
Mark Abraham [Tue, 15 Mar 2016 17:06:15 +0000 (18:06 +0100)]
Add more const correctness
This helps reduce the size of a future change to how we handle reading
and writing conformation files.
In particular, the way we write symmetric low-level I/O routines that
do both reading or writing according to the value of a parameter
requires that parameters for values to be written are passed as
non-const at some level of the call stack. In several cases, this
change moves that point lower down, so that routines whose job is
writing complex data to files take const parameters. Improving this
helps understand what is going on when e.g. an analysis tool reads in
coordinates, perhaps modifies them, and then writes them. The down
side is that we need to use some const_cast as we get close to the
low-level routines.
Change-Id: I5c22d8681fa5f247f302409bf67b062ebc9fe766
Mark Abraham [Thu, 25 Feb 2016 15:55:15 +0000 (16:55 +0100)]
Fix warning from GCC 6
These omega variables could not actually be used when uninitialized
but the code is anyway improved by this change.
Change-Id: Ic038ecf1f5ae1ff1f43bbc77b6de023d94b020b2
Mark Abraham [Fri, 12 Feb 2016 09:18:49 +0000 (10:18 +0100)]
Improved some mdrun integration tests
Made some better naming.
Avoided using the whole of OPLS/AA just to get water parameters, which
may have been unstable when calling grompp many times in a binary, for
reasons unknown.
Removed warning suppressions for unsupported icc compiler versions.
Change-Id: I5107fca64f6a73e3c0b85146f28347453ded705a
Mark Abraham [Tue, 15 Mar 2016 14:08:03 +0000 (15:08 +0100)]
Warn about OpenMPI 1.8.6
Refs #1897
Change-Id: I529e59f095bb38a35569da57d395658e038e3f8f
Berk Hess [Wed, 16 Mar 2016 09:04:31 +0000 (10:04 +0100)]
Fix non-normalization of FFT autocorr
The function many_auto_correl incorrectly divided by the number of
data points, leading to incorrect absolute values of autocorrelation
functions, as e.g. out by gmx analyze -ac -nonormalize.
Also simplified the FFT size calculation.
Added a test for -nonormalize functionality.
Fixes #1914.
Change-Id: If0961c91d6e0a78f88e4390ce55b2a56143dd5e5
Mark Abraham [Tue, 15 Mar 2016 12:48:31 +0000 (13:48 +0100)]
Fix typo in Fourier dihedral equation
This now matches the version in the preceding subsection, and seems
consistent with thoses equations and the code in convparm.c.
Change-Id: Ied7b74d07463f36ad6b1aadfb1c5f4a5b9e776a7
Viveca Lindahl [Mon, 21 Sep 2015 16:18:38 +0000 (18:18 +0200)]
Added pull coordinate geometry angle-axis
The new geometry is described in the docs.
Some checks in readpull.cpp where reorganized since adding new
geometries made some old logic a bit convoluted.
Change-Id: I310891740d65e3c47948a06726e7a9e7c974bc44
Viveca Lindahl [Wed, 10 Feb 2016 20:29:31 +0000 (21:29 +0100)]
Added pull coordinate geometry dihedral (angle)
How to use the new geometry is explained in the docs.
Change-Id: Ic5fc106df923857e5e94409583e3793dcd7f9fb8
Viveca Lindahl [Wed, 15 Apr 2015 23:04:58 +0000 (01:04 +0200)]
Added pull coordinate geometry angle.
A new subsection was added to the docs explaining the new geometry.
Change-Id: Ie3ab77f6f2e9f35b3afe1a99414a9c0b2bafb991
David van der Spoel [Mon, 29 Feb 2016 14:57:28 +0000 (15:57 +0100)]
Fixed division by zero in polarize.
Replaced dr2*gmx::invsqrt(dr2) by std::sqrt(dr2) in polarize.
invsqrt does not work for values that can be zero.
Change-Id: I1ae75bb64c35ac51afd237b2138cbacc86a425dc
Carsten Kutzner [Fri, 18 Mar 2016 10:01:52 +0000 (11:01 +0100)]
Clarified license text in the .tex manual
This is the remainder of patch https://gerrit.gromacs.org/#/c/5659/
(which is to be abandoned), which was not already fixed
by https://gerrit.gromacs.org/#/c/5655/
Change-Id: I170f63d94a0ab0c4258f8a66b8c1a524aaf975ef
Nicholas Breen [Sun, 28 Feb 2016 21:22:59 +0000 (13:22 -0800)]
Various minor spelling fixes - messages and code comments only.
Subsequent revision: Also update copyright years as necessary.
Change-Id: I04ad9fd4948ea6c8598262cdd4b1fb87e5348248
Carsten Kutzner [Thu, 3 Mar 2016 10:55:48 +0000 (11:55 +0100)]
Fix membed initialization
In commit
29943fe59dca471b2 the membed initialization was copied over to do_md,
but in do_md, state_global instead of state (which is still a NULL pointer
at that point) needs to be passed to init_membed.
Change-Id: Ic43e66f054f0e9b139ff9b402eb7cf4d08d20731
Szilárd Páll [Tue, 15 Mar 2016 20:46:49 +0000 (21:46 +0100)]
Fixed OpenCL e/fshift clear kernel first invocation
During refactoring the energy/shift force buffer clearing OpenCL kernel
call was moved before the kernels get initialized. This could never
work, but the lacking error checking in the code and (apparently) lax
runtimes allowed it to go unnoticed for some time.
Change-Id: I4bc13bf5e4e33c0797d6f08ad3cd7185d548af96
Mark Abraham [Mon, 14 Mar 2016 14:32:27 +0000 (15:32 +0100)]
Merge branch release-5-0 into release-5-1
Change-Id: Ic214521c998b898175a3c9317534008faccef280
Teemu Murtola [Mon, 14 Mar 2016 14:55:36 +0000 (16:55 +0200)]
Add tests for empty strings in refdata
This is apparently fragile functionality when trying to rewrite the code
with something else than libxml2.
Change-Id: Iff6a3e76745e43212fc67a6481341d6413e72873
Jakub Krajniak [Thu, 25 Feb 2016 15:23:13 +0000 (16:23 +0100)]
Fix tabulated wall potential.
The dispersion and repulsion of the tabulated wall potential were a factor
too small by a factor 6 and 12 respectively.
Fixes #1912.
Change-Id: I56c4b81e7f40f5d5dc9e0f2a184cf4a1bb01e3e5
Berk Hess [Fri, 12 Feb 2016 17:58:17 +0000 (18:58 +0100)]
Remove restriction on dihdraltypes order
Wildcards are allowed for dihedraltypes by grompp. But it was assumed,
with only code and no user documentation, that wildcard types come
after fully specified types. If this order was switched, grompp would
silently choose the wildcard matches. Now the match with the most real
atom type matches is chosen.
Added a paragraph to the bonded parameter section in the manual on
dihedral wildcard matching.
Fixes #1901.
Change-Id: I7e44ff19e4d069d1b186ea470438a47d48f1a72d
Teemu Murtola [Sat, 3 Oct 2015 04:12:14 +0000 (07:12 +0300)]
Support replacing solvent in insert-molecules
Make it possible to specify the solvent (or other set of atoms) with
-replace (as a selection) for gmx insert-molecules, and make the tool
replace residues from this set with the inserted molecules, instead of
not inserting there. It is assumed that the solvent consists of
single-residue molecules, since molecule information would require a tpr
input, which might not be commonly available when preparing the system.
Add a basic test for the functionality.
Change-Id: I3a60cdcec9a1675a116c13d83465b07a69b3388d
Mark Abraham [Sat, 12 Mar 2016 14:29:42 +0000 (15:29 +0100)]
Update use of isnan in linearalgebra
In C++11, cmath now provides std::isnan, so we should use it.
Change-Id: Id3c9f3103722c2ae94c93f09cea74a27852bcbc2
Mark Abraham [Sat, 12 Mar 2016 13:31:49 +0000 (14:31 +0100)]
Find hwloc quietly if it's already been found
Change-Id: I9463decfd16731ddb8da4e6badb679bd53bf8900
Berk Hess [Mon, 29 Feb 2016 21:08:52 +0000 (22:08 +0100)]
Split off Fermi kernel from CUDA kernel
There are now two CUDA nbnxn kernel files, one for Fermi and one for
all architectures after Fermi. Compute capability 2.x does not support
many features used in the newer kernel, so this change removes many
preprocessing macro from the code and simplifies the shmem math.
This reorganization is even more useful with the coming combination
rule kernels that add more macros and shmem branching.
Also removed the ci_offset variable and replaced &~4 by &3.
Change-Id: I6fa50970215ad32e4d3e4430af04b403db83abc2
Roland Schulz [Wed, 2 Mar 2016 20:58:23 +0000 (13:58 -0700)]
Fix MPI_IN_PLACE test
Failed for Intel-MPI because cmake adds libmpicxx and it depends
on libstdc++.
Change-Id: I7b4e485502779ce923fe511ed99ad224166b9aa1
Mark Abraham [Tue, 29 Dec 2015 16:42:23 +0000 (17:42 +0100)]
Add infrastructure for tests to compare mdruns
These can now compare energies and forces within a given tolerance,
though nothing in this commit actually uses the new infrastructure.
Added and updated input files for various kinds of argon, water,
peptide and lipid boxes.
Added some functionality so that scoped_ptr can work with the legacy
APIs for reading files and managing data structures.
Change-Id: I463f7902568ec778871e0ae15898f240ff51d3ad
Mark Abraham [Fri, 11 Mar 2016 00:37:02 +0000 (01:37 +0100)]
Merge branch release-5-0 into release-5-1
Change-Id: Icb8d7bc8d5639a824fe5f45e72eede3ec7b9d3c9
Roland Schulz [Fri, 26 Feb 2016 02:00:43 +0000 (18:00 -0800)]
Improve AVX512
Reduce usage of masking when not necessary.
Fixed FPE in debug mode.
Change-Id: I88e6324535471669faf1c30642128242d5f54e7d
Mark Abraham [Tue, 16 Feb 2016 22:29:17 +0000 (23:29 +0100)]
Require 2015 version for MSVC
C++11 support is best in this compiler in its latest version, which is
an acceptable compromise for this platform.
This is not good for a CUDA build, which won't officially support MSVC
2015 until CUDA 8, which is unlikely to be released before GROMACS
2016. Thus, there is likely to be a few months where CUDA-enabled
GROMACS 2016 cannot be built by a supported MSVC host compiler.
MSVC 2015 adds warnings for illegal implicit narrowing of double to
float, when used in a brace initializer. In some cases
* we intend the interpretation as real, which is now explicit
* we can just use double
* we can suppress the warning
* in some test code, it is more convenient to initialize as double and copy
Change-Id: Ic6b2f9165b6f1aaa3dc59ce05cd6ffb3abe8861c
Berk Hess [Wed, 9 Mar 2016 22:09:51 +0000 (23:09 +0100)]
Fix Ewald 3DC dipole with Verlet scheme
With the Verlet scheme, the dipole for the Ewald 3DC correction was
summed twice over the PP-ranks at non-NS steps, leads to a correction
that was a factor #PP-ranks too high.
Fixes #1916.
Change-Id: Iefda70ea42f1b6a4c6bd4cbc7c0982114282c15c
Erik Lindahl [Wed, 12 Aug 2015 16:02:42 +0000 (18:02 +0200)]
Random engines & distributions as proper C++11 classes
This change implements the ThreeFry2x64 random engine with flexible
number of encryption rounds and internal counter bits. The class is
compatible with the C++11 random number generators, and the GROMACS
tabulated normal distribution has likewise been turned into a
random distribution compatible with C++11, meaning they can be used in
almost any combination with the standard library distributions.
- The ThreeFry2x64 implementation uses John Salmon's idea of a template-
selected internal counter so a number of bits are reserved to generate
an arbitrary random stream. This makes it possible to use ThreeFry as
a normal random engine, and even in counter mode it is possible to
draw an arbitrary amount of random numbers before restarting counters.
- Both accurate (20-round) and fast (13-round) versions are available.
- There is a gmx::DefaultRandomEngine when we don't care about details.
- gmx::GammaDistribution has been added to work around bugs in
libstdc++-4.4.7 headers, and to avoid getting different results
for libstdc++ vs. libc++.
- Custom Uniform, normal, and exponential distributions have been added
to make all results reproducible across platforms since stdlibc++ and
libc++ do not use the same generating algorithms.
- Code using random numbers has been updated, but no changes have been
made to turn random seeds into 64bits yet.
- The selection nbsearch unit test was a bit fragile and very sensitive
to the coordinate specific values; this has been fixed so it should
be resilient no matter what RNG is used in the future.
Change-Id: I47a04d03e2f264e1a6ef0aa0a2174cb464ed9af7
Erik Lindahl [Mon, 18 May 2015 22:36:57 +0000 (00:36 +0200)]
Support for portable hardware locality (hwloc)
Added CMake support to detect hwloc, and extended the
HardwareTopology class to read information from hwloc when present,
with the CpuInfo class as fallback. All available information is
dumped to the log file at the beginning of the run. Tested to
work on x86 (Intel, AMD, both single- and dual socket), Power8 and
32-bit Arm, and hwloc versions back to 1.5 (for older hardware).
Change-Id: Id98819254bdfdcb5725f1aee625ecb65dc40f752
Mark Abraham [Sat, 20 Feb 2016 13:07:06 +0000 (14:07 +0100)]
Add SETTLE unit tests
Now that there is a decent seam for how to set up and run SETTLE, we
can have some unit tests that will support making further changes.
Also taught the include sorter that tuple is a C++11 header that we
now use.
Change-Id: I02c2ee19df0322e7c36dd7e65bb992d76f63a5da
Berk Hess [Tue, 2 Feb 2016 09:38:11 +0000 (10:38 +0100)]
Converted csettle() to use real/SimdReal template
The csettle() routine for constraining coordinates has been converted
to call a template function that is instantiated with either real or
SimdReal. On Haswell this makes settle a factor 5 faster.
Added reorganized indices to the SETTLE data structure, these are set
by a new function settle_set_constraints analogous to LINCS.
Reorganized the SETTLE initialization.
The settle fatal error no longer report the atom index of (one of the)
problematic water molecule, but the pdb dumps are more useful anyhow.
Change-Id: I61d81ad8a0add6fe234f8c7b5b44dc8c7084ace9
Erik Lindahl [Sun, 27 Dec 2015 10:28:25 +0000 (11:28 +0100)]
Fix length of SIMD4 boolean vector in reference build
While it will be correct for default builds, this length
should always be 4 even if the user changes the float
SIMD width.
Change-Id: Iea76382fa8c4819f39dc90c7177ad167e9c36ae1
John Eblen [Thu, 3 Mar 2016 13:53:07 +0000 (05:53 -0800)]
Minor corrections to license file COPYING
Change-Id: I31424b0919c4229069700d097550d0caa7a64039
Berk Hess [Mon, 7 Dec 2015 21:23:31 +0000 (22:23 +0100)]
Use SIMD transpose scatter in bondeds
The angle and dihedral SIMD functions now use the SIMD transpose
scatter functions for force reduction. This change gives a massive
performance improvement for bondeds, mainly because the dihedral
force update did a lot of vector operations without SIMD that are
now fully replaced by SIMD operations.
Change-Id: Id08e6c83d4c9943d790bfe2a40c70fa4697077af
Berk Hess [Mon, 7 Dec 2015 20:25:05 +0000 (21:25 +0100)]
Use rvec4 for listed force buffer
To allow aligned 4-wide SIMD force reduction, changed the force
buffer passed to ifunc from rvec to rvec4.
Because the listed and normal force buffers now have a different
format, the listed force buffer reduction over threads also needs
to reduce the buffer of thread 0 and is thus now also used without
OpenMP threading. This extra cost is more than outweighed by the
faster listed force updates.
The current force updates in bonded.cpp will be replaced by SIMD
transposeScatter in a separate commit.
Change-Id: I9a9027d983efaf0e919634a55bf2c906aad1fa90
Berk Hess [Thu, 25 Feb 2016 19:27:36 +0000 (20:27 +0100)]
Remove epsfac from GPU kernel inner-loops
Multiplying the i-charge instead of the j-charge with epsfac in
the GPU kernels removes a flop form the inner-loop. On Maxwell with
CUDA the gain is 2-3% in the force kernels and 3-4% in the energy
kernels (which is more than a flop, probably a register less is used).
There is a single additional division in the energy kernel, but the
gain more than compensates for this.
Change-Id: I6924b32f3f61d3b7bbe532f95361b7fefb577609
Berk Hess [Wed, 24 Feb 2016 13:53:08 +0000 (14:53 +0100)]
Minor code reordering in GPU kernels
Updating bCalcFshift just before use instead at the top of the kernel
improves performance by 1-2% on CUDA. This also improves readability.
Making specialized (no)shift kernels will only add 1% gain.
Also updated the OpenCL kernels for consistency and readability
(the perfromance impact is negligible with current hardware/compiler).
Change-Id: I309f90ad61e5815726d55254e2cd38d5e4e7662d
Szilárd Páll [Fri, 26 Feb 2016 16:40:51 +0000 (17:40 +0100)]
Fix leftover load() calls in the simd util headers
Commit
dc2488c8 left behind in a number of headers load() calls. This
change replaces these wiht simdLoad() fixing compilation on ARM NEON,
IBM VSX, VMX, and QPX.
Fixes #1909
Change-Id: I6b5e70f738380ea95d862c207ba45bb48fe2c805
Berk Hess [Wed, 24 Feb 2016 20:11:17 +0000 (21:11 +0100)]
Fix compilation with Simd4Float, without Simd4Double
Commit
971335dc added Simd4Real in nbnxn_search.cpp when Simd4Float
was supported, but Simd4Double not, e.g. SSE in double precision.
Fixes #1910.
Change-Id: I92464aa9e63c08901e272c7b9337bfe8c741b0aa