BioD PNPI Git Repos - alexxy/gromacs.git/log

Merge release-5-0 into release-5-1

Conflicts:
cmake/gmxManageFFTLibraries.cmake
Fixes from release-5-0 goes in only one branch of the refactoring in
release-5-1 that separate the own-fftw build from an installed-fftw
build.

docs/manual/monster.bib
Insertions in both branches, both retained

src/gromacs/gmxana/gmx_wham.cpp
Fixed formatting of copyrighted years.

src/programs/mdrun/mdrun.cpp
Relocation of please_cite() calls was adjacent to removal of bSepPot,
trivial resolution

Change-Id: I9d6116534775906d74ddbd45925176a21b5afcd9

Fix FindFFTW behaviour

FFTW 3.3.5 with --enable-avx* will enable any useful 128-bit
SIMD flavours by default. (See Erik's 579cec9a6 in their repo.) Our
detection code will observe this, and will be silent.

With earlier FFTW, if we're doing a GROMACS AVX build, and we have
FFTW SIMD, and not SSE(2) SIMD, then we want to warn the user to
reconfigure FFTW to add SSE support. Made this behaviour correct, and
minimized the necessary infrastructure for it.

Added detection support for some other SIMD-support symbols that
are present in the FFTW repo for upcoming hardware.

Fixes #1809

Change-Id: If586250895664581316505a5595da7442e789f8d

Fix mdrun -nb auto -rerun with GPU and energy groups

inputrec was being used after calloc, but before being read from the
.tpr file. This meant mdrun -nb auto -rerun behaved as if -nb gpu has
been used (gave a fatal error).

Moved the code that regulates whether GPUs are supported (e.g. with
Verlet+energy groups) into the new function
nbnxn_gpu_acceleration_supported(), so it can be used consistently
with other such checks. Split the old nbnxn_acceleration_supported()
into nbnxn_gpu_acceleration_supported() and nbnxn_simd_supported(),
which seems a bit simpler given that the old code was called twice,
with different constant values for bGPU.

Made the thread-MPI launch code aware of bUseGPU, so that we can call
nbnxn_gpu_acceleration_supported once and do all the things correctly.

The fatal error for mdrun -nb gpu -rerun with energy groups is now
issued by SIMMASTER in runner.cpp, rather than in md.cpp.

Despite appearances, this change doesn't alter much mdrun behaviour,
except as mentioned for -rerun, and that GPU detection is now run
for mdrun -rerun when we will not end up using those GPUs.

Fixes #1823

Change-Id: I324d23ce98b0041dad148db8eac744c5dd5f66fb

Fix -seltype option for C++ tools

Because of a misthought in the logic of how to check whether onlyAtoms()
was specified for all SelectionOptions, the -seltype option was never
enabled for any tool (since 5.0).

Change-Id: I365afdc53524f0a237654a603be1c051b8e55dc9

Fixed a bug in computational electrophysiology with DD

With domain decomposition, the non-master nodes did not correctly
compute the center of "broken" water molecules chosen for position exchanges.
This would lead to a crash soon after such a water was exchanged with
an ion. The problem was in the init_swapcoords() routine, which passed
an empty box vector on the non-master nodes and therefore got the
wrong PBC information except on master.

Change-Id: I94715e0dfb58a206e27f45a6ab149c7b808d06c1

Fixes bug in the computation of Coefficient of Thermal Expansion

Between release-4-6 and release-5 an bug was introduced calling the
routine analyse_ener with the wrong flag. The flag bFluct can be set
to subtract the average before computing the ACF (which is a remnant
from old code which is being fixed in patch
https://gerrit.gromacs.org/#/c/4310/).
As a result of the incorrect usage of this flag to analyse_ener the stored
energy would be modified yielding an incorrect
Coefficient of Thermal Expansion. (By passing the -nmol option which is needed
for fluctuation property calculations the wrong option would be activated).
Change-Id: I303206c3118c2d5bb499ba4c04a94fe9de2c7992

Fix reaction-field warning message

The whole-function scope of err_buf combines with the inexpressive
macro CHECK to mean that we can give a warning using the error message
of the previous check.

Change-Id: Id9882994eee5ee5a0844f4adfb5d29a0042d51d5

Fix broken OpenCL detection

The OpenCL device compatibility test broke in c66e8fa for all platforms
other than OS X. This change makes the OpenCL code functional again.

Change-Id: Iebb015549e952c24ccb8c2118721a11064e036f2

Fix OpenCL build

A spurious semicolon in CMake code broke the OpenCL build
which this changes fixes.

Fixes #1824

Change-Id: I38a7d343184aff61718be43e17f23bec3c1eee4d

Detect incorrect cycle counting

When threads are not pinned to cores, the wallcycle counting can get
messed up on machines where the cycle counters are not synchronized
between the cores. Since it's difficult to detect thread pinning on
all architectures and without pinning the counting can be correct on
synchronized machines, we try to detect incorrect counters. We only
detect negative cycle counts, but that should catch nearly all cases.
When we deteced an invalid count, we ignore the cycles counted and do
not print the cycle accounting table. Dynamic load balancing is never
disabled, because a few incorrect load measurements do not cause
problems here.

Fixes #1821

Change-Id: I076ad685a043f1f0b913a9b089ebea43a62534f5

Add GROMACS 5 paper to paper lists

Also call to please_cite, and cite both recent papers
in the reference manual

Change-Id: I5e7a81690a1d0fbea0147d8a2a5f281f7b5d4aab

Avoid integer overflow in gmx_wham by using gmx_int64_t.

somce gmx_wham internally uses femtoseconds, a cast to int can lead to
overflows in microsecond timescale simulations.

Change-Id: Ib7589c3c1ce47b29b3ada458b85bf577b41996d1

Harden FindSphinx.cmake

If execute_process failed somehow, then SPHINX_VERSION_OUTPUT_VARIABLE
could be empty. If so, we want the detection to work, and to report
that a valid Sphinx was not found. This changes makes sure the
string() command is valid CMake.

Change-Id: I0d6f483d15776402504c77689b5228b094dd605f

Bump patch version to prepare for 5.1.1

Change-Id: I1baa0f3adc26ed85b186086c7536c76c9b090b30

Version 5.1

Removed "-rc1" tag per policy. Bumped regressiontest hash.

Documented when LIBRARY_SOVERSION_MAJOR should typically be increased.

Change-Id: I98dc8206669724702f391de29bf68ed240bb1b48

Make mdrun -nt work again

Moved several thread count checks and assignments to a new
check_and_update_hw_opt_3 to clarify the options processing.
Also reorganized the PME thread count checks to avoid unclear
error messages for corner cases.

Renamed bOMP to bHasOmpSupport for clarity.

Converted assert calls to GMX_RELEASE_ASSERT (and added new ones).

Fixes #1803.

Change-Id: I35005d60d4981fa7c223449aaa53b663d4d13e1d

Avoid segmentation fault in gmx chi.

If a system containing a custom residue is being analyzed, gmx chi
would seg fault if that residue was not in residuetypes.dat. This
can occur, for instance, when the topology is created in some
directory and analysis in another, or if the analysis is done on
a different machine.

Fixes #1802

Change-Id: I85df855381c8e233020fd1e1f73165d4e1215ab9

Make it easier to find selection help

Now the command-line reference part of the user guide has a TOC entry
that has the magic work "selection" in it.

Fixes #1805

Change-Id: Ib99840ab1f3fd72a52f28ffa1e06ee1242fa4310

Make it easier to find selection help

Fixes #1805

Change-Id: Ie09baa75601f6ba3e8e514f78e3bae34e7327f86

Re-enable multi-GPU support in OpenCL for thread MPI

There was no known problem with thread MPI when I disabled it. Real
MPI segfaults at context creation time with the AMD runtime and more
than one rank on a node. Made this possible, and updated docs.

Also updated some other docs and OpenCL TODOs.

Change-Id: I32a4eb5e02176999587feff7a2754f40029adaa8

Documentation fixes

Change-Id: Ifc9fbcc6ed8214154915120067c080826d108b3f

add clFlush to kick of OpenCL work

The OpenCL standard specifies clFlush as a necessary step which issues
all enqueued commands ensuring CPU-GPU concurrency which otherwise is
not guaranteed. For this reason three flushes are added to dispatch work
in the queue.

Additionally the specs (v1.2 sec 5.13) state that a flush is required in
the inter-stream synchronization case.

In total four flushes are added, their overhead seems to be small.

Fixes #1784

Change-Id: Ia287998c2716e21708979d6e8d261f853e39d4ef

Avoid FP exception with empty .mdp file

The default value for nsteps is zero, which used to lead to division
by zero in the above case. Refactored to avoid that.

Added a basic test case for grompp with an empty .mdp file (which
causes all default settings to be used). It is not optimal that the
mdrun integration test machinery is recycled for this, but
re-implementing or refactoring that to avoid the ugliness is not great
for a release branch.

Noted some TODOs. Renamed some variables for clarity.

Change-Id: I5c98c4f460f658277490b11f7fb5629d6fa84a08

Add documentation for energygrps with GPUs

There's other documentation for this, but it's hard to cross-reference
everything.

Change-Id: I400f5cd8179428d0c0c340efd6314404e273be43

Add to author lists

In particular, Jiri implemented much of the NVML support in GROMACS 5,
and has made other contributions to the GROMACS CUDA support over the
years.

Change-Id: I605eea83c50ccd2516779ecbe9397d00324dc40b

Prevent AMD OpenCL on OS X prior to 10.10.4

OS X release 10.9.x has been confirmed to produce invalid OpenCL
kernels for AMD (but correct for NVIDIA), and we have tested that
it is fixed in 10.10.4. The fix might actually have appeared
already in 10.10.3, but it's not worth tracking down the exact
point since 10.10.5 is already out.
This change will issue a CMake warning when compiling GROMACS
on earlier OS X releases, and at runtime we check the version
again and mark AMD GPUs as incompatible.

Also fixed docs about the length of time JIT compilation takes
on AMD.

Refs #1783

Change-Id: I0202faea60c39daae6621d2bb9ba828aab5532a0

Fix for processors being offline on Arm

Use the number of configured rather than online CPUs.
We will still get a warning about failures when trying to
pin to offline CPUs, which hurts performance slightly.
To fix this, we also check if there is a mismatch between
configured and online processors and warn the user that
they should force all their processors online
for better performance.

Change-Id: Iebdf0d5b820edcd7d06859a2b814adf06589ef96

Abort if PME tuning is active and counters reset

Triggering counter reset (in various ways) could happen at a
non-nstlist step, which provokes a software inconsistency error in
5.1. This is reveals that all recent releases have permitted reset
while tuning was active, which is useless and potentially wrong.

Introduced a getter pme_loadbal_is_active, so that the fatal error
can be issued when conditions for counter reset are satisfied
and PME load balancing is still active.

Noted a TODO to have the load-balancing module use its own getter in
future; such a refactoring is probably fine, but worth avoiding in a
bugfix branch. Noted a TODO to make a counter-reset module, consider
alternative solutions to #1781, and other clean-up. Documented some
stuff.

Fixes #1781

Change-Id: I912e3da837bd32280f295ad98cc6b8170f4d2d81

Fix some stuff in user and install guide

Some code was not showing up correctly. Remove prompt ('%') for some
code. Add gmx prefix where missing. Fixed some typos.

Change-Id: I7e2352e340547e4eb5794b8de323fce540709a89

Require tune_pme to take -mdrun parameter

The transition away from many binaries in 5.0, and the removal of
symlinks in 5.1 mean there is even less chance that a sensible default
command line for calling mdrun can be provided. In general, mdrun
might be in either precision, with or without MPI, with or without
some custom suffix, inside the wrapper binary or from an mdrun-only
build, or from a custom directory.

Thus, the user is now required to say what command to call to run the
simulation they want to optimize. The MDRUN environment variable
remains as an undocumented and deprecated convenience feature for 5.1.

Fixes #1754

Change-Id: Ia2c278732ef2a79d7304967b230149eca4597888

Merge branch release-5-0 into release-5-1

No conflicts

Change-Id: I551220310b529542d4cc3635815101cf53b325ec

Fix some phrases that were not appearing in manual

Some phrases were in the index tag in the manual and not being
actually displayed. For example, "\index{potential function}s" only
displayed the letter "s". This should be "potential
functions\index{potential function}". This adds the appropriate
phrases next to the index tag.

Change-Id: Ia8e308f00a2542c1ee8c52fb017ba38926407876

Fix docs of ftypes of dihedrals

The ftype indices for restricted dihedrals and combined
bending-torsion interactions were documented the wrong way around.

Fixes #1796

Change-Id: I5d3c52cc0f326c9e379efe50d0841940d51e4894

Fix FindNVML.cmake

Change-Id: I4ed1a1ba20e74b9fb2bd0abbafc87caedc0c1c2f

Avoid using C++11-specific data() of std::vector

Replaced the data() method of std::vector with the
adress of the first element.

Change-Id: I9b6340b4823fb65c7f284a5e8c972caaa24930b1

Fix thread-affinity checking

Merge commit e3d1a22325113 introduced HAVE_SCHED_AFFINITY to master
after it was renamed from HAVE_SCHED_GETAFFINITY in 8b7f2d16750 in
release-5-0. But in master, another use of HAVE_SCHED_GETAFFINITY had
already occured in 15d71933ab5, and we didn't notice the problem in
the merge. Found while converting the source file to C++ in master,
when compilers complained about the unused symbols. More good reasons
for -Wundef, -Wunused, C++ conversions, and minimizing refactoring in
release branches.

Change-Id: Idbdf03de3da87bd76be29c8e1c96045531cd3c4e

Fix em dash in new reference

This sometimes gets rendered badly

Change-Id: I2f6570cecb66777270c099b272b296c16800a43d

Fix PGI compiler flag

Work around compiler crashes by removing inter-procedural analysis.
Removed the extra -fastsse option which is identical to the -fast
option already added for release build types. Updated the warning
about not recommending PGI for now.

Change-Id: I0a43e19f3035a44330d8d32bb557639357100aa8

Merge release-5-0 into release-5-1

Change-Id: If110ea29e1756bf6c0b18d3d3852ee4099641ea1

Fix use of wrongly-named GCC minor-version variable

Change-Id: I878eec93f09077b7410c95183bb462c9ddf1c50b

Fix possible linking of libxml2 to zlib

Not all linking scenarios can do the right thing with a naked -lz, so
we should use the zlib that was already found, which will be a full
path to the library.

Change-Id: Ie63278ea4db2c16bcf92aaa804530da7996ab6f1

Merge branch release-5-0 into release-5-1

Conflicts:
CMakeLists.txt
Version numbering management code has moved to new home, nothing
needs to change here from the bumps to numbering in release-5-0
branch.

Change-Id: I8e45709c83d5c181900cca50b2ccba489630bbe7

Stop mdrun printing potentially undefined value

Variable pv is only defined with conditions that need not be identical
with the presence of pressure coupling. Definitely compilers aren't
sure about it.

Change-Id: I80670e12b799c7f476caa565ef7ce9c3e38b0686

Version bumps after new release

Change-Id: Icfa27c2f62f4eb83cc57102e8b9239e1d8891e38

Version 5.0.6

Removed -dev tags from versions. Bumped regressiontest hash.

Change-Id: Ie86dd0d7a375c3fc86d59f59494c03bf350fde38

Merge release-5-0 into release-5-1

Conflicts:
src/gromacs/gmxpreprocess/grompp.c (adjacent changes, took both)

Additionally, added a declaration for a variable in sasa.cpp that had
been removed from enclosing scopes.

Change-Id: I15c10ef00416aa1f791a58b97d79669efab9a1c5

Update advice about grompp -t

grompp -t state.cpt does not copy coupling-algorithm state from a .cpt
into the new .tpr, even though it can still do so from grompp -e.

Refs #1775

Change-Id: I8c4d68fc8d3750b79f30c0f77115c80e1f3cf9b3

Fixed gmx sasa output residue numbering

gmx sasa with -or was writing the wrong residue numbers

Change-Id: I52b13f1eeec2ee028e5ec580139c0290df7b69c9

Fix incorrect warning about only using single core

check_resource_division_efficiency() warned that only
a single core could be used, even when one of
thread-MPI or OpenMP support was present, but not both.

Fixes #1776.

Change-Id: I6ac98954c2ef74ed860750627f7a1b9f0710561e

Removed information about obsolete .edo file format

Essential dynamics output is written to edsam.xvg file since version 4.6

Change-Id: I0daca58fab1ebc953e8799de7e1667c4956b460e

Implement zsh shell completions.

zsh completions when gmx prefix was used were not implemented. Use zsh builtin
bashcompinit to use the bash completions. Remove 'shopt -s extglob'
from gmx-completion.bash and place it in GMXRC.bash, since it is incompatible
with zsh. gmx-completion.bash and gmx-completion-gmx.bash can now be used for
completions in both bash and zsh.

Change-Id: Ib8f3cf0535d39e91a6b31933f41aa7548755b351

Update example old mdp file in user guide.

Update mdp file in user guide. Remove obsolete parameters, and choose parameters
that will not give error and warning by default. Also remove the contents of
mdout.mdp, since the user can easily retrieve that with the given command, and
the file is now much larger. Add note on parameters being dependent on force
fields. Fixes #1774.

Change-Id: I2d5f1cfcf933275f95352c25891ee0792944212c

Fix spelling errors in user guide.

Fix spelling errors in user guide. Also update spelling to American
English, only because that is what is used in the manual and in other
documentation (e.g., neighbour -> neighbor).

Change-Id: Ib11d5be727bef88750a8aa45f2592d10aabb929e

Fix -deffnm -multi[dir]

Make -deffnm work even when all other functionality in
FileNameOptionManager is disabled (which happens with -multi and
-multidir). Add a test for the broken case.

Fixes #1769.

Change-Id: I0f729eaa1ab6e9a4a25d1d121cbfe5b5b2673e4b

Merge release-5-0 into release-5-1

Conflicts:
src/gromacs/gmxana/gmx_dos.c

Change-Id: I06208bcafb880b9bac3f45050d709fae5eab9a9d

Give bTypePerturbed a valid initial value

Commit a6ae71b202 didn't do this correctly, so FEP with LJPME has had
broken PME load estimates (only).

Change-Id: I88235c34b499e6dfca650138d66cdc17bd40afb4

Fix grompp .edr IO estimate for free-energy calcs

This has been wrong since it was introduced in c7a82654f. It doesn't
really matter, except that the variable is unused and converting to
C++ complains about it.

Change-Id: I8e13a07e2680cfe5b69fac170d500b235a5ba113

Merge release-4-6 into release-5-0

Change-Id: I25fea1226adfaa332c5c7b0630e99031266178f4

First 5.1 release candidate

Change-Id: I89776d6dd4d170d4f259fa8ee761cf4f6878cbe0

Fix too small GPU pair count estimates

For triclinic unit-cells with DD the non-local cluster pair count
estimate was too high, especially for thin local domains, due to an
incorrect estimate of the cluster size. Since the pair count estimate
for the local pair-list was determined as a total minus a non-local
estimate, the local estimate could get negative and cause exceptions.
Fixed the cluster size estimate and added a lower limit for the local
size estimate.

Fixes #1762.

Change-Id: I3489550968f66bc03ba4e6056017a58eba37f7cc

Fix bug in GPU list balancing

The function split_sci_entry could produce empty lists. This seems
not to have caused incorrect results, only slight extra processing
of empty workunits in the CUDA kernel. Incorrect Coulomb energies
could appear for empty lists with shift=CENTRAL, but that does not
seem to happen.

Refs #1767.

Change-Id: I0b0ff0a450734d4863f1e9636ff5741d4f1a68da

Fix DD DLB state issue

The introduction of DLB locking for PME load balancing added another
DLB state, which was stored in a third variable. These variables
were not always all properly checked. Simplified the code by merging
these three state variables into one. In added there was a fourth
variable (bGridJump) is gmx_domdec_t, this is replaced by calls to
a functions returning is DLB is on.

Refs #1760.

Change-Id: I80d499149e4e5bfd689e76208384a8ba61e2842a

Fix bug in GPU list balancing

The function split_sci_entry could produce empty lists, which can
cause illegal memory access or incorrect energies. Before commit
6106367b this bug was never triggered, since nsp_max was never smaller
than a full cj4 entry. But 6106367b introduced a but that could
produce negative nsp_max.

Fixes #1767.

Change-Id: I2007cf6851f94f4f2ca62f609a0628725014dbe7

Fix copy-paste bug in gmx distance

The -oxyz option did not behave properly (the computed values should be
fine, but the behavior of where the output goes can be unpredictable).

Change-Id: Idcd389c3809189f85a630094b9aaea6d61a5f954

Obey OpenMP thread count limit with tMPI

With thread-MPI mdrun would choose the number of OpenMP threads so
that the maximal number of hardware threads was used. When the number
of ranks was limited by the system size, this led to too high OpenMP
thread counts which lowered the efficiency. Now a limit is imposed.
Also updated some comments and renamed constants and bNTOptSet.

Change-Id: I830b5a3f2fd28f87acfbcf982103b62fc3e45758

Fix two PME DLB trigger issues

Dynamic load balancing got triggered while locked by PME load
balancing, because a check was placed incorrectly.
PME load balancing would never trigger with separate PME ranks
because a comparison was inverted.

Fixes #1760.
Fixes #1763.

Change-Id: I75eeb32423b864f84bfd45ecb61d169b473ed74a

Fix ThreadMPI GPU assumptions

The OpenCL implementation introduces the constraint of one GPU per
node, but thread-MPI still assumed any compatible GPU was available
for use and thus should have a rank.

Consolidated the configure-time constants behind some API functions so
that we can use the same behaviour in the various setup code.

Added a warning message that the OpenCL implementation has to waste a
GPU, stopped showing another warning message related to wasting
GPUs when the OpenCL implemenation forces this, and improved
another message to clarify why gmx mdrun -ntmpi 2 won't work
with OpenCL.

Also fixed a few references to thread-MPI threads that are better
called thread-MPI ranks.

Change-Id: I4664c49786ebd26a53cbf5e1c26df79649ba4f5f

Correct grompp pull warning message

A warning full pull-coord?-groups referred to pull-coord?-geometry
instead. This is fixed by changing the order of proceses the pull
options, which better reflects the dependencies. Also reordered
the options in the mdp manual.

Change-Id: I6309d021282156cd3409af35bcfa38dc2cab1c67

Fix OpenCL compilation errors.

Fixes a typo in a structure. Also fixes an incorrect
variable name only visible on OS X.

Fixes #1765

Change-Id: I0ee0f61da1f036163aa85f719ef9ceb0dab06868

Remove status messages about Sphinx detection

Make FindSphinx.cmake and FindPythonModule.cmake respect the QUIET
option, and pass that to find_package() to not print out information on
every CMake run. Most people will not care whether these are found or
not, and being silent in all cases is the same approach as is used for
Doxygen.

In master, it could be useful to change at least some of the documentation
build rules such that they require GMX_DEVELOPER_BUILD to be set, and
that could also enable messages about not finding the components needed
for the documentation build, but that is outside the scope of this
change.

Fixes part of #1761 and #1764.

Change-Id: I196f5e66c94fe4247ae28bd230a469acbaad939a

Fix inconsistent OpenMP automation

The thread MPI single rank max thread count for non-Intel was 6,
this was smaller than the max allowed MPI+OpenMP thread count of 8,
which caused setups to be generated that did not pass the check.
Increased 6 to 8 and added an assertion.

Change-Id: I13787616d7c667cba3245da4f5b5c3a1a6a1206d

Document how to add and use NVML support

Change-Id: I8ca7c5d1b163a78559a048ca6cc5b099f34c6cd6

Avoid GPU data race also with OpenCL

Implements the same change to non-local stream synchronization as now
used for CUDA.

Fixes #1756

Change-Id: I720edc0951f97dcff0bd477084fff45a149f01d9

enabled 1 PP + 1 PME node

Change-Id: I18a4c2bac71f1b5b81d9d374b212bfb9edc7a1e8

Add checks for inefficient resource usage

Checks have been added for using too many OpenMP threads and when
using GPUs for using single OpenMP thread. A fatal error is generated
in case where we are quite sure performance is very sub-optimal. This
is nasty, but a fatal error is the only way to ensure that users don't
ignore this warning. The fatal error can be circumvented by explicitly
setting -ntomp, in that case a note is printed to log and stderr.

Now also avoids ranks counts with thread-MPI that don't fit with the
total number of threads requested.

With a GPU without DD thread count limit is now doubled.

Disabled GPU sharing with OpenCL.

Change-Id: Ib2d892dbac3d5716246fbfdb2e8f246cdc169787

Add support for flushing WDDM queue

Relevant only with CUDA on Windows (and profiling?)

On Windows the WDDM driver (default for non-Tesla) can prevent
immediate submission of CUDA tasks to the GPU in an attempt
to try to amortize driver overheads. However, as we need
tasks to start immediately for optimal concurrent execution,
this "feature" will result in large overheads. A well-
documented workaround is implemented by this change.

Change-Id: I69a6bb59dc8cae18fba539de49c977c0ee814d07

Merge "Merge branch release-5-0"

Fix OS X openCL builds

OS X does not like the quotes previously used to handle
OpenCL include paths with spaces - escape them instead.
With this change, OpenCL works at least on Yosemite
(OS X 10.10) using a GeForce GT 750M card, and passes
all Gromacs regression tests.

Change-Id: I2acd30256e2ff11ca1fde10361cc0cc55ee7fc05

Fix bugs in gmx dos

- Velocity autocorrelations were not normalized
  by default, so they did not agree with gmx velacc.
- The normalize option had no effect on the VACs.
- The index group option was available, but no
  index groups were processed.
- Since the DoS is calculated from the mass-weighted
  VAC and by default only from the real part, it was
  not clear why these results would differ from data
  obtained with gmx velacc. There is at least a note
  about this now, and more docs will be added in the
  future.
- The hidden option to dump some plots has been
  removed since it was not documented what these
  contained (beyond a paper reference), and the
  contents was not based on any data from the
  trajectory, but rather plotting a custom function.

Fixes #1608.

Change-Id: Icfca060f94efb34bd7871bd90245ab0ddbbe91c1

Replace functions deprecated in OpenCL 1.2

Check for CL_VERSION_1_2 in the source, and
use newer versions in that case to avoid
warnings about deprecated functions.

Change-Id: I6f70e0178fa06c59be57168d94aae0fd7df148f5

Only accept exact matches for selection keywords

The selection parser tried to be nice to the user and also accept
unambiguous prefixes of keywords, but this also has a lot of side
effects that can be confusing (e.g., it was impossible to create
variables that had names that were prefixes to keywords or to other
variable names). Additionally, this was the only case where user input
could cause an exception during tokenization of the input string, and
that wasn't handled very well during interactive input, either (it
caused the whole program to stop, instead of just reporting the error
like is done for other parsing errors).

Remove the logic, and only make the parser accept exact matches.
Add a few synonyms for keywords where there is a natural abbreviation.

Change-Id: I6041baa2f5a3b7dab87c3d5991e883d2d74ace66

Merge branch release-5-0

Conflicts:

src/gromacs/gmxpreprocess/hackblock.c
  Used new name for header file for gmx_warning.

src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda.cu
  Moved code to the other side of the sync point as
  in release-5-0. Renamed cu_nb to nb.

src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda_data_mgmt.cu
  Changed name of event to destroy. Renamed cu_nb to nb.

Change-Id: Iee9e2ea372ee704057a4a51ad9e4ab9a22ab7fe6

Documented build types in developer guide

Change-Id: I9212c0399b59c69845f20322b7d8ace3de0b61c5

Merge release-4-6 into release-5-0

Conflicts:

  src/gmxlib/nonbonded/nb_free_energy.c
Deleted - change already exists in release-5-0

  src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda.cu
Applied same reordering to the (unchanged) release-5-0 code

  src/mdlib/nbnxn_cuda/nbnxn_cuda_types.h
Renamed misc_ops_and_local_H2D_done, and added Doxygen for it

Change-Id: I4c34d168af347a59b7821da6fea71a4715ec5bae

Implement OpenCL support

StreamComputing (http://www.streamcomputing.eu) has implemented the
short-ranged non-bonded interaction accleration features previously
accelerated with CUDA using OpenCL 1.1. Supported devices include
GCN-based AMD GPUs and NVIDIA GPUs.

Compilation requires an OpenCL SDK installed. This is included in
the CUDA SDK in that case.

The overall project is not complete, but Gromacs runs correctly on
supported devices. It only runs fast on AMD devices, because of a
limitation in the Nvidia driver. A list of known TODO items can be
found in docs/OpenCLTODOList.txt. Only devices with a warp/wavefront
size that is a multiple of 32 are compatible with the implementation.

Known issues include that tabulated Ewald kernels do not work (but the
analytical kernels are on by default, as with CUDA), and the blocking
behaviour of clEnqueue in Nvidia drivers means no overlap of CPU and
GPU computation occurs. Concerns about concurrency correctness with
context management, JIT compilation, and JIT caching means several
features are disabled for now. FastGen is enabled by default, so the
JIT compilation will only compile kernels needed for the current
simulation.

There is some duplication between the two GPU implementations, but
the active development expected for both of them suggests it is
not worthwhile consolidating the implementations more closely.

Change-Id: Ideaf16929028eb60e785feb8298c08e917394d0f

Add MSAN build type

This permits GROMACS to build with Memory Sanitizer in clang >= 3.4.

Refactored the tests that linking to libxml2 and zlib work, which is
simpler to follow now that there are three paths. The new MSAN path is
useful because making try_compile tests work with MSAN is tricky, and
only useful for very few developers.

Fixed a missing header exposed by using a different C++ library.

Documented use in the deveoper guide

Change-Id: Ia3e8077ac732386563eebfa54f2f7d71ebd74a33

Fix bug removing multiple dihedrals in main rtp entries

The Gromacs-5.0 series has had a serious bug where pdb2gmx
would only consider the first entry when several explicit
bonds were listed for the same atoms in an RTP entry. Older
topologies have worked fine.

Fixes #1704, #1755.

Change-Id: I0b34aeb905dab8ea66196cabc0745583ef6d7209

Fix all use of nbnxn_simd.h

Added nbnxn_simd.h to the set of SIMD headers for which we do
per-commit testing for correct use.

Fixed a bunch of files that were acquiring the dependency
transitively, which is error-prone.

Change-Id: Ic8fb462c7790a5e723f901700dd06fd09543e58a

Make some unit tests skip file system access

Make unit tests for FileNameOptionManager not use the file system.
Introduce a FileInputRedirectorInterface to support mocking file
existence checks, and use a mock implementation in the tests.

These particular tests did quite a bit of file system access, and the
speedup is only a few ms, although significant percentually (something
like 80%). But there can be tests where this has more effect, and this
approach provides a starting point for more work on eliminating
unnecessary file system access from the tests.

The main benefit is clearer and more robust test code, as it is no
longer necessary to construct actual files and ensure that they do not
conflict with other tests or cause issues if the test crashes or such.

Change-Id: Ib9a171331e988fa7e74b16078164f477f8296c6e

Removed gmx protonate tool

This tool appears to have been largely unused, since
testing shows it crashes for a normal trajectory all
the way back to 4.6. Since it is only relevant for
united-atom force fields, we'll reduce the maintenance
load by simply removing it for now - it might reappear
in the future.

Refs #1618.

Change-Id: If57e250f0ffbe32bcc948d09b54b225db9724c35

Improve DLB+PME tuning with GPUs

With GPUs and the DD DLB can quickly limit the PME load balancing
room too much. In such cases (and only with DLB=auto) we now first
do PME load balancing without DLB and then, if DLB gets turned on,
a second round of PME load balancing.

Also fixed that when DLB limited the tuning, the fastest choice was
reset, which would often lead to stronger limitations.

Change-Id: I0087e6b8512d5574d8d0fa2db82e6e38279a82f1

Fix CUDA inter-stream synchronization issue

With the introduction of multiple hardware queues in CC 3.5 and later
NVIDIA GPUs, the implicit dependency between tasks in the local and
non-local kernel got eliminated. However, as the misc_ops_done event
that we sync with in the non-local stream preceded the local coordinate
transfer, even though the tasks in the local stream are always issued
first, under rare circumstances the non-local kernel could start before
the local coordinate transfer completes. This would lead to non-local
interactions being calculated using coordinates (and charges) from the
previous step.

This change moves the synchronization point to creating a dependency
between the local coordinate transfer and non-local non-bonded kernel.

Change-Id: I0b3837d46db6469f6b1d9869a3a73b5176d93d99

Fix double precision reference SIMD and gcc bug

The double precision logical operations on floats were
incorrect for the reference SIMD implementation for C
source, and GCC appears to be buggy with C++, likely
due to strict aliasing assumptions at -O3 interfering
with the required casts - solved by sticking to unions
for now. This might be slower, but the reference
implementation is not used for production anyway.

Change-Id: If048bda298618ae67968861c4a850d080c8cce31

Fix one error and compiler warnings with Cuda & clang-3.6

Clang-3.6 on OS X can now be used by nvcc. clang found one
error related to || being used instead of | to set flag bits,
and a handful of warnings variables in headers not being used.
The latter is caused by declaring constants in headers, and
making then static to avoid clashing symbols. However, this emits
them in every single compile unit that includes the header. Fixed
by either moving names to a cpp file, or changing to defines.

Change-Id: Ib4d59c40aa8caffc667cc202a3efe45891b2abe3

Append _pullx_ and _pullf to pull files when -deffnm used.

Changes -deffnm behavior for pulling so that the pullx and pullf
files don't collide. Previously, this resulted in one being
backed up and checkpoint restarts failing when -deffnm was used.
(Technically this applies to anything where -px and -pf are identical
and not explicitly set, but that only happens with -deffnm.)

Additionally return fatal error if -px or -pf set and output files
collide.

Fix is now localized to the pull code.

Fixes #942 except for log file collision with pull-rotation.

Change-Id: I27b8b4ced0f307905e2c2ea4fb260376dd25dc32

Remove unused cmake files

Change-Id: Id1a132e316539243dafa06a17ae2e8f69dc9f448

Don't use check_library for libm

Using check_library doesn't work with build-ins like sqrt
with Werror. sqrt was anyhow only a placeholder for a standard
libm function. We really only need to know whether the library
exists.

Fixes #1750

Change-Id: I6a550cf8c8b8ea985b28130a4339935fb8c9741a

Improve pair search thread load balance

With very small systems and many OpenMP threads, especially when
using GPUs, some threads can end up without pair search work. Better
load balancing reduces the pair search time. Also the CPU non-bonded
kernel time is slightly reduced in the extreme parallelization limit.

Change-Id: Ib036ea3ba59f497eeee7afa73a71fb0e0ccd216e

Improved the intra-GPU load balancing

The splitting of the pair list to improve load balancing on the GPU
was based on the number of generated lists. But this number can be
high(er) due to small lists before splitting. This lead to too few
lists for small systems and too many for large systems.
Now the splitting is based on the number of pairs in the list up till
now. This produces much more stable results.
Because of the more stable results, we increased the min_ci_balanced
factor from 40 to 44 (closer to the ideal 48).
With small systems on many threads we used to generate many more lists
than targeted. Because the algorithm is now far better, we increased
the minimum list size from 32 to 36 and still get fewer lists.

Change-Id: Id2210171a409ef1a27f7dc919fe806f0fe4d869c

Fix CUDA architecture dependent issues

Only device code gets generated in multiple passes and therefore
target architecture-dependent macros like __CUDA_ARCH__ or our own
IATYPE_SHMEM (which also depends on __CUDA_ARCH__) are not usable in
host code as these will be both undefined. As a result, current code
over-allocated dynamic shared memory. This has no negative side-effect.
This change replaces the use of macros with runtime device compute
capability checks. Also texture objects are now actually enabled,
which give very minor performance improvements.
Note that on Maxwell + CUDA 7.0 there is a 20% performance regression
for the tabulated Ewald kernel (which is not used by default), which
magically disappears when texture references are used instead.

Change-Id: I1f911caad85eb38d6a8e95f3b3923561dbfccd0e