BioD PNPI Git Repos - alexxy/gromacs.git/log

Merge branch release-5-1 into release-2016

Change-Id: Id782b78da4b2080fec208d29b9b4afc363f1418c

Fixes incorrect charges in OPLS/AA-L for HISD and HISE

With the present charges results from Michael Shirts et al.
J. Chem. Phys. 119 (2003) 5740 can be reproduced.

Fixes #2013

Change-Id: Ifb037961499633210d26ff32ec6bf3bebc505461

Add toolchain flags for SSE2 compilation

This is how the code should be, though my testing didn't need it.

Also fixed function documentation.

Fixes #2008

Change-Id: Ic1e1c16d22e054e9f741d91c667305de7398cd30

Bumped patch number to prepare for future 5.1.4 release

Change-Id: I813e0d8c976d0afca316f5cfe095c5647b8fd66c

Fix issues with using int for number of steps

Mostly we use a 64-bit integer, but we messed up a few
things.

During mdrun -rerun, edr writing complained about the negative step
number, implied it might be working around it, and threatened to
crash, which it can't do. Silenced the complaint during writing,
and reduced the scope of the message when reading.

Fixed TNG wrapper routines to pass a 64-bit integer like they should.

Made various infrastructure use gmx_int64_t for consistency, and noted
where in a few places the practical range of the value stored in such
a type is likely to be smaller. We can't extend the definition of XTC
or TRR, so we're stuck. TNG is already good, though.

Fixes #2006

Change-Id: If485d9d92cb4b99a3bbe25e8e9fa082fc3fccd5f

Version 5.1.3

Bumped minor soversion and regressiontest hash

Change-Id: I55f1cc4f1154f053512c8cc3509fd0be8e3c347b

Fix trr magic number reading

The trr header-reading routine returned an "OK" value if the magic
number was wrong, which might lead to chaotic results everywhere,
because checking of return values tends not to happen (even when
they're right). Other GROMACS magic-number reading routines tend to
give a fatal error if the the number is wrong, e.g. by reading a file
written in wrong endianness. (This should never be a thing for XDR
files, which are defined to be big endian, but such code has existed.)

This change adds/restores that behaviour to trr reading, along
with separating the behaviour of failing to read a magic integer
from reading one that doesn't match.

Fixes #1926

Change-Id: I3cdd8ae9172e3b95fc232d8fa31a442d239233db

Fix gmx mdrun -membed -ntmpi -1

This used to give a fatal error if we'd chosen more than one rank, but
instead we should choose to use a single rank. Now that we have
several kinds of things that want to trigger such handling, build some
infrastructure to handle it. This allows us to separate the trigger
logic from how we report back to the user.

Change-Id: If01177b590c1a817113c642c30fd30d6d4bc3eab

Fix membed with partial revert of 29943f

The membrane embedding algorithm must be initialized before
we call init_forcerec(), so it cannot trivially be moved into
do_md(). This has to be cleaned up anyway for release-2017
since we will remove the group scheme be then, but for now
this fix will allow us have the method working in release-2016.

Fixes #1998.

Change-Id: I8e16a67b8df16999793649d79eebca051b8b474f

Fix for incorrectly setting expanded ensemble when fep_state = 0

Expanded ensemble lambda values were not being copied over when the
proposed fep_state was 0. This commit fixes that by removing a
a conditonal that was incorrectly added.

Fixes #1995

Change-Id: I3d49b0936d973fb70a9a79799743f5069ba4fff4

Fix regressiontest download md5sum

I forgot that the md5sum changes with the decision to
name it rc1 rather than beta3, because the directory
name within the tarball changes.

Change-Id: I5b5f5b9de43ee5771fbdfb53842609f1635f3f38

First 2016 release candidate

Change-Id: I54a3e3c33015eb42ba5c18994ef76ce1bb2888c5

Hardware detection clean-up and pre/post-processing

This change cleans up the various tests and hacks we have
had in different places to remove a number of false warnings
and errors. It also unifies all this processing to two
small routines that are called just before/after the
hardware is detected, so the user can choose whether it
should be done or not in other places.

- Rather than trying to guess when we should or shouldn't
  override the number of cores online, the preprocessing
  uses a piece of code that allows sleeping cores to come
  online automatically by running a small C++11 thread loop before
  doing the hardware topology detection. This way we can
  remove all ARM-specific paths. To avoid wasting a second
  on systems where SMT is disabled, we avoid calling it on x86.
- All SMT warnings are handled in the post-processing call,
  but only as notes in the log file to avoid writing
  warnings on stderr.
- The check for OpenMP thread mismatch has been removed
  since it caused incorrect warnings by comparing the number
  or threads configured for each OpenMP process with the
  total number of cores in the entire system. We will have
  to rewrite this later as a test in the MPI/OpenMP
  parallelization setup instead.

By sticking with the hwloc/sysconf-online detection, we should
now also handle all special cases where cores have been taken
offline manually in a correct way without using hardware-specific
paths.

Change-Id: I37edb3eada3f4c8c0906c641c7041cc0270985e8

Prevent fragile use cases of checkpoint appending

There are way too many ways we allow runs to be continued
and extended. We still allow the checkpoint file to be
missing (so -cpi can be used for all command lines), but
we warn if it is not found. To avoid mistakes with file
appending when restarting from checkpoints, we now require
that all previous output files must be present
(unless -noappend is used), and that the file names must
match the ones used in the previous run.

Fixes #1777.

Change-Id: Id9e89773a4a9214be6dbb76676c526e98e12bd37

Update affinity checking for Intel compiler

Since release 2013, the Intel compiler no longer sets
OpenMP affinity by default according to intel docs. This
means the code that tried to disable it by setting
environment variables no longer had any effect (and we
likely have not checked it the last few years). In
addition, it is not ideal behaviour in a library either to
assume that no other code has executed OpenMP prior to a call
or to set environment variables that influence other code
execution too. However, we will still disable Gromacs' own
thread affinity if it has been set by the user or calling code.

Change-Id: I57a952766e87a35483f1960b349f7c30d5b85f24

Print working dir before command line

When running GROMACS via a batch script, it is useful to know which
working dir is being used for relative paths (file names) in the
command line.

Change-Id: Iab6701e09ad3b0386b59c2bdda2c4f908fdc2d0a

Merge "Merge branch release-5-1 into release-2016" into release-2016

Updated g_wham for the new pull setup

This bring g_wham up to date with the new pull setup where the pull
type and geometry can now be set per coordinate and the pull
coordinate has changed and is more configurable.

Change-Id: I629fee9dfea9715e2bd54685cd1c203c66d05089

Merge branch release-5-1 into release-2016

Change-Id: I976ad584d1c8757c6464aae482cb5a4c245beacb

Introduce fatal error for too few frames in gmx dos.

To prevent gmx dos from crashing with an incomprehensible error
message when there are too few frames, test for this.

Part of #1813

Change-Id: Ie2f23d68cb3d4570944c4ade5ced49873dc98a29

Simplify SIMD compiler flag construction

Lots of code duplication for handling both C and C++ is removed. Some
new helper functions handle the language-specific aspects separately
from the main logic. Introduced the notion of preparing the toolchain.
Removed one lot of status messages now that the try_compile status
message is more descriptive that it is checking for flags for SIMD
support.

Moved checking code that caters for ancient clang for AVX_128_FMA to a
common point, eliminating some variables. Moved that point to before
the AVX_128_FMA compiler flag tests, which seems more likely to work
in this case. This unlikely combination is of little significance,
however.

Change-Id: I5377a7951f5414ee52c4960fe63c999f55ff5109

Add detection for ARMv7 cycle counter support

ARMv7 requires special kernel settings to allow cycle
counters to be read. This change adds a cmake setting
to enable/disable counters. On all architectures but ARMv7
it is enabled by default, and on ARMv7 we run a small test
program to see if the can be executed successfully. When
cross-compiling to ARMv7 counters will be disabled, but
either choice can be overridden by setting a value for
GMX_CYCLECOUNTERS in cmake.

Fixes #1933.

Change-Id: I1e217d7a09f84a6bcf4eb5bf4a656d430465c915

Work around compiler issue with random test

gcc-4.8.4 running on 32-bit Linux fails a few
tests for random distributions. This seems
to be caused by the compiler doing something
strange (that can lead to differences in the lsb)
when we do not use the result as floating-point
values, but rather do exact binary comparisions.
This is valid C++, and bad behaviour of the
compiler (IMHO), but technically it is not required
to produce bitwise identical results at high
optimization. However, by using floating-point
tests with zero ULP tolerance the problem
appears to go away.

Fixes #1986.

Change-Id: I252f37b46605424c02435af0fbf7a4f81b493eb8

Add PDF metadata to the manual

Not only nice, but should also help search engines.
Also make sure that the PDF is UTF8 (and is tagged as unicode too).

Replaced in the bib a Unicode minus sign (U+2212) that was causing
trouble with the standard hyphen-minus.

Change-Id: I4708084282cad24b62369101e6155d99dc236981

User guide clarifications

Clarified OpenCL and CUDA as well as pinning in the glossary.

Change-Id: Ib3b74805812e8c02545819873e077f9d7ec9bab4

Reduce hwloc & cpuid test requirements

On some non-x86 linux platforms hwloc does not report
caches, which means it will fail our strict test
requirements of full topology support. There is no
problem whatsoever with this, so we reduce the
test to only require basic support from hwloc - this
is still better than anything we can get ourselves.
Similarly for CPUID, it is not an error for an
architecture to not provide any of the specific flags
we have defined, so avoid marking it as such.

Fixes #1987.

Change-Id: I0a065296bc647b7f7f5d3cb178e88df80fac81a7

Move consistency check for number of threads

Recent refactoring meant that the count of threads known to OpenMP was
only used to provide a consistency check in the fallback case of
sysconf detection.

Note that no version of the HardwareTopology code ever used
information from OpenMP as an input.

Parameters for implementing warnings currently remain in
HardwareTopology::detect(), so that the sysconf case can continue to
handle cases where we think we know how to advise the user. This
will go away in master branch with the new logging module.

Change-Id: Iaf383abc420d2f04b6514e8856c9f1414cbba55c

Change handling for missing releng docs

Use a placeholder document from the source tree instead of generating it
on the fly in case RELENG_PATH is not set.

Also fix other warnings that got generated with RELENG_PATH not set.

Fixes #2000

Change-Id: I80b075a39eeff36e64b78cd58b09bc5bdde6ffad

Add user guide info on CUDA application clocks

The new paragraph explains what application clocks are, how are/can be
used as well as the specifics of the mdrun behavior.

Change-Id: Ibfd1d058d9162002ca0c1b148fc44bad29cc3180

Fix build if libcuda is found but nvcc is not

We require nvcc to be found by find_package, so this commit makes that
explicit. Otherwise the code doesn't understand CUDA_MAJOR_VERSION,
nor why we are setting cache variables that have no value to advanced.

Change-Id: Id7ebde308d268bf225f4b1220bc2b20c0e3e6da1

Relax pull PBC check

The check in the pull code for COM distances close to half the box
was to strict for directional pulling. Now dimensions orthogonal
to the pull vector are no longer checked. The check was actually
not strict enough for directional pulling along x or y in triclinic
units cells, but that is a corner case.
Furthermore, the direction-periodic hint is now only printed with
geometry direction.

Added tests for the maximum pull distance calcuation.

Fixes #1962.

Change-Id: I8e389ba3f0490ca67586fd10bdc9d71d9957ab45

Document GMX_GPU_APPLICATION_CLOCKS

Added description for the use of the GMX_GPU_APPLICATION_CLOCKS
environment variable.

Change-Id: Id275b5a607d5b4dc23f88c71cc1eebfb5418fb50

Add checks for too much memory in g_nmeig

g_nmeig could request storage for eigenvector output and matrices
for more than INT_MAX elements, but nearly all loop variables are int.
Now a fatal error is produced in this case. This also avoids the
confusing error message when too much memory is requeste; snew
will get the correct size, but gmx_fatal prints it as %d.
Removed double allocation of eigenvectors.
Added support for -first > 1 with sparse matrices.

Change-Id: If425457afb532a5116146cd69c3ff712f43d541d

Fixes for Power7 big-endian

Now compiles and passes all tests in both double and single precision
with gcc 4.9.3, 5.4.0 and 6.1.0 for big-endian VSX.

The change for the code in incrStoreU and decrStoreU addresses an
apparent regression in 6.1.0, where the compiler thinks the type
returned by vec_extract is a pointer-to-float, but my attempts a
reduced test case haven't reproduced the issue.

Added some test cases that might hit more endianness cases in future.

We have not been able to test this on little-endian Power8; there is
a risk the gcc-specific permutations could be endian-sensitive. We'll
test this when we have hardware access, or if somebody runs the tests
for us.

Fixes #1997.
Refs #1988.

Change-Id: Iede0eac22504b22973f1a40d2b0180f10a34b7ed

Disable NVIDIA JIT cache with OpenCL

The NVIDIA JIT caching is known to be broken with OpenCL compilation in
the case when the kernel source changes but the path does not change
(e.g. kernels get overwritten). Therefore we disable the JIT caching on
NVIDIA.

Fixes #1938

Change-Id: I68749ea695a891ab8f14f07fc830ce632299b0c8

Add grompp check for pull group

Added a check in grompp for valid pull groups in a pull coordinate.
Using a pull group out of range would cause invalid memory access
in grompp.

Change-Id: Icfca671cd551b618eafa67329f75efc5a7f7d945

Fix use of _POSIX_THREAD*

This fixes a couple of aspects of behaviour. Formerly, if
_POSIX_THREADS was defined and equal to zero, we might have used
clock_gettime and got some kind of error (compiling/linking/runtime
behaviour). Similarly, if _POSIX_THREADS was undefined, C99 defines
such preprocessor symbols as zero, so we again used clock_gettime
inappropriately.

Now we avoid compiler warnings if the symbol is undefined, and when it
is defined we use clock_gettime only when _POSIX_THREADS_ has a value
such that it is supposed to work.

Adapted this an the BG/Q fix also for
gmx_gettime_per_thread(). Expanded the documentation of why the code
is the way it is. Noted future TODO to consider std::chrono.

Fixes #1980

Change-Id: Ib3e40903e2344354074c5328d40e8467f264b51f

Merge branch release-5-1 into release-2016

The changes to checks in release-5-1 to check_nthreads_hw_avail() are
all incorporated into detectLogicalProcessorCount().

Change-Id: Ifbee24dc40630988a14026b9f45cfaf6e92390ba

Refactor sysconf checking

The introduction of layered checking for hardware topology left
checking based on sysconf and OpenMP applying in cases where it is
unclear whether the conclusions are valid, and surely difficult to
test reliably.

Instead, those checks should apply only to sysconf-based detection
(which is the final fallback case). Those are refactored here with a
view to code changes that will merge shortly from release-5-1.

Removed a one-line output to the debug stream.

Change-Id: Ide56fa8d6817608fba9f29a05da10034ae543b75

Work around glibc 2.23 with CUDA

Fixes #1982

Change-Id: I24671fcbdfdf1fb8bcc178edaeb801e849959266

Properly reset CUDA application clocks

We now store the application clock values we read when starting mdrun
and reset to these values, but only when clocks have not been changed
(by another process) in the meantime.

Fixes #1846.

Change-Id: I722d7153202e8f4c6a5330948dcbef06bb6acf28

Update to include a gcc 6.1 configuration

Change-Id: I806b335f84624f143d91fc9249d29ab08ca6a05a

Fix SIMD configuration management

Subsequent runs of cmake gave inconsistent diagnostic messages because
SUGGEST_BINUTILS_UPDATE was not set on subsequent runs because we were
caching the result of logic, as well as caching the results of
compilation tests. This made life confusing, e.g. when compiling with
gcc on MacOS with clang assembler not available.

Instead, we now re-run the fast logic (quietly, if this is a
subsequent run).

Improved the handling of ${VARIABLE}, because there was no need to use
FORCE because the semantics of an unset variable in CMake just work.
There was also no need for such variables to be put into the cache,
and we were using one more variable than we needed to use. This meant
it was no longer worth implementing the redundant hints about perhaps
updating the binutils package, nor suppressing the redundant special
status-line output.

Noted some TODOs for future simplification. Changed the use of SIMD to
SOURCE, since this utility code doesn't have to relate to SIMD flags.

Change-Id: Id9605ccff0903c55e2621ddd8af10c8da523bebe

Removed unnecessary inter-simulation signalling

Generally, multi-simulation runs do not need to couple the simulations
(discussion at #692). Individual algorithms implemented with
multi-simulations might need to do so, but should take care of their
own details, and now do. Scaling should improve in the cases where
simulations are now decoupled.

It is unclear what the expected behaviour of a multi-simulation should
be if the user supplies any of the possible non-uniform distributions
of init_step and nsteps, sourced from any of .mdp, .cpt or command
line. Instead, we report on the non-uniformity and proceed. It's
always possible that the user knows what they are doing. In
particular, now that multi-simulations are no longer explicitly
coupled, any heterogeneity in the execution environment will lead to
checkpoints and -maxh acting at different time steps, unless a
user-selected algorithm requires that the simulations stay coordinated
(e.g. REMD or ensemble restraints).

In the implementation of signalling, we have stopped checking gs for
NULL as a proxy for whether we should be doing signalling at that
communication phase. Replaced with a helper object in which explicit
flags are set. Added unit tests of that functionality.

Improved documentation of check_nstglobalcomm. mdrun now reports the
number of steps between intra-simulation communication to the
log file.

Noted minor TODOs for future cleanup.

Added some trivial test cases for termination by maxh in normal-MD,
multi-sim and REMD cases. Refactored multi-sim tests to make this
possible without duplication. This is complicated by the way filenames
get changed by mdrun -multi by the former par_fn, so cleaned up the
way that is handled so it can work and be re-used better. Introduced
mdrun integration-test object library to make that build system work a
little better. Made some minor improvements to Doxygen setup for
integration tests.

Fixes #860, #692, #1857, #1942.

Change-Id: I5f7b98f331db801b058ae2b196d79716b5912b09

Use sysconf(_SC_NPROCESSORS_ONLN)

If we're not on ARM and sysconf(_SC_NPROCESSORS_ONLN) doesn't match
sysconf(_SC_NPROCESSORS_CONF), we should use the former, as that is
what the correct count on x86 with hyperthreading disabled in the kernel.

Added some comments on assumptions and future possible problems.

Fixes #1991.

Change-Id: Id851b8acfbd6b9a2837e8c0e4340b2267a35a20a

Merge branch release-5-1 into release-2016

Omitted the patch that removed support for mdrun -multi -maxh, since
support for that will be reinstated very shortly.

Change-Id: I10e71dddc5c4fb8f32425da7390b64d64e4b16dd

Made gmx dos work again.

Due to an error in the index handling gmx dos always stopped with a fatal
error.

Fixes #1996

Change-Id: Iba7685c1e1b86acc92427902a2187eb4e6c9f260

Fix vsite bug with MPI+OpenMP

The recent commit b7e4f30d caused non-local virtual sites not be
treated when using OpenMP. This means their coordinates lagged one
step behind and their forces are not spread to the atoms, leading
to small errors in the forces. Note that non-local virtual sites are
only used when local virtual sites use them as a constructing atom;
the most common case is a C/N in a CH3/NH3 group with vsite H's.
Also added a check on the vsite count for debug builds.

Fixes #1981.

Change-Id: Ibe13b75b8ae9841937ad4abc007dba5ad78a30cd

Fix Verlet buffer calculation with nstlist=1

Under rare circumstances the Verlet buffer calculation code was
called with nstlist=1, which caused a division by zero. The division
by zero is now avoided.
Furthermore, grompp now also determines and prints the Verlet buffer
sizes with nstlist=1, which provider the user information and adds
consistency checks.

Fixes #1993.

Change-Id: I6777f9c18dfcdaee0e4fe3e4609704fb48b5c138

More accurate GPU list splitting

Instead of splitting the GPU lists (to generate more work units)
based on a maximum cut-off, we now generate lists as close to
the target list size as possible. This is more accurate and has
the advantage that we don't use assumptions on the j-group sizes.

Change-Id: I3942a1f42e3d9a2163897030762652ed309d8cf1

Allow use of external lmfit package

For packaging GROMACS for software distributions, we need to permit
maintainers to use external system libraries, rather than the versions
we bundle for convenience for general GROMACS users.

This patch implements the CMake option GMX_EXTERNAL_LMFIT. It
implements FindLmfit.cmake, which relies heavily on pkg-config, which
is supported by lmfit and widely available in the intended use case
(ie distribution maintainers), but can be worked around if lmfit is
available and pkg-config is not and somehow an external lmfit is
needed.

lmfit management now needs to happen at a higher point in the GROMACS
build system, so I have refactored the source code files that use
lmfit functionality. The former gmx_lmcurve.cpp in src/external was in
fact GROMACS code that calls the lmmin routine from lmfit and declares
a suitable callback, so it should never have been in
src/external. Moved it (and its header) to
src/gromacs/correlationfunctions.

Reverted the introduction of gmx_ prefixes in an earlier commit - that
approach permits a third party to link to their own lmfit without
symbol clashes, but doesn't resolve the distribution problem. If both
clients of lmfit can use the same external version, then the problem
is solved, and if each need a different version then there are various
further options. Updated the README accordingly, and noted various
other differences from stock lmfit 6.1.

Added file-level Doxygen for some files. Added mention to install
guide.

The implementation is quiet upon repeat cmake runs, and makes only
advanced or internal cache variables.

Fixes #1957

Change-Id: Ib05eee796c6cf13ea90d456cb9b54b166bfda717

Prevent use of mdrun -maxh -multi

A proper fix can probably be made in release-2016, and if so, the
content of this commit should not be merged forward.

Refs #1942

Change-Id: Ie7e6c0ca25fba09ad1794cacbe116b03e95ff0f9

Updated OpenCLTODOList.txt

Change-Id: I704a5e40b0aa2382fb35e9fb09f99580cd75a020

Fix configuration-time cpuinfo code

Was broken by merge 7139bb20, and left incomplete in 4ce2aa504d
because of the existing duplication of cpuinfo code.

Change-Id: I72c3c12e7b523f03420d251d87271e5e7f87f71b

Removed spurious r_ij in force for Morse potential in manual.

The code was correct though.

Change-Id: Ia3459569de08b3bf1eb79206ff13092a555c27cb

Merge branch release-5-1 into release-2016

Change-Id: I175bb4c5d313a33c0c95446f85b1af2128ce301c

Add nbnxn pair-list rebalancing

The nbnxn CPU pair-lists, and the non-bonded tasks can be unbalanced
over threads, especially for small systems. This change implements
pair-list rebalancing that result in near perfect load-balancing.
This increases the search cost by 3%, but this is outweighed by the
more balanced non-bonded kernel times.

Change-Id: I64e7395127faf193cabe48146e554c696bf76a51

Remove OpenMP overhead at high parallelization

Commit 6d98622d introduced OpenMP parallelization for for loops
clearing rvecs of increasing rvecs. For small numbers of atoms per
MPI rank this can increase the cost of the loop by up to a factor 10.
This change disables OpenMP parallelization at low atom count.

Change-Id: I0006526568bb387f91e0a373f7ef203b3809f2e7

Correct GPU pair estimate for list splitting

The heuristic estimate for the number of cluster pairs is now too
high by 0-1% instead of 10%. This results is a few percent fewer
pair lists, but still slightly more than requested.

Change-Id: I2e8305a2152913f161a5f47643e8cd7510e81fec

Adjust and document C++11 compiler tests

icc 16.0.0 didn't cope with the previous form, because it wants to use
a base-class copy constructor to implement the return. Disabled the
new move-constructor test for early patch releases of icc 16, but left
it active in the cases where it (is expected to) pass.

Change-Id: Iff3d52c7bc4180a33c495b7c73bb1328bb18764a

Prevent writing to unallocated memory in mk_specbonds

Stop creating new specbonds after at most nspec bonds have been created
to prevent writing to unallocated memory.

Change-Id: I53f9d20059915e7fba8767b92d92fa751e9165e3

Fix bug with fbposres+MPI+OpenMP

The flat-bottom reference coodinates got mixed up when running with
MPI + OpenMP.

Fixes #1969.

Change-Id: Idcac7ae03e1f7018bc7b65de37a3d63abe8ebefc

Update pre-submit matrix contents

Converted a config so that we have one that uses neither MPI. Also
needed an incidental fix for the build script to make that work.

Refs #1693

Change-Id: Ieebc939f6c9cf1d3a84681ce212e61059053cf55

Remove warnings on checkpoint mismatch

mdrun now only warns for mismatch in minor version, build or
number of ranks used when reproducibility is requested.
Also added a separate message for not matching precision.

Fixes #1992.

Change-Id: Ia20e6beff86484f0b70148c155cdb53fed012136

Improve docs for linking stdlibc++ needed by icc

Change-Id: Ibbce9588c090c142ea8b62111818d2788b52a37b

Remove std::thread::hardware_concurrency()

We should not use std::thread::hardware_concurrency() fo determining
the logical processor count, since it only provides a hint.
Note that we still have 3 different sources for this count left.

Change-Id: Id536e517419bb33294693d91b6f010d0d5342352

Report the filename and the line number on failure

Extend the call to gmx_fatal in fget_lines() to report the filename and
the line number where the read failed.

Change-Id: Ib5ee06c06111cb61be616a5a4d01339da56a5685

Merge branch release-5-1 into release-2016

Change-Id: I02dae90bd8dfa2279081bc8547ae447b68b30a76

Fix multi-sim + DD + simple distance restraints

The check should not look for the existence of a multi-sim, because
the user must also set GMX_DISRE_ENSEMBLE_SIZE in order to get
ensemble restraints.

Fixes #1989

Change-Id: Id8c9aeefb17583a6ef9ef5caf46232bc384f2ecd

Fix OpenCL error with empty domains

We now don't call the force clearing when there are zero elements
to clear, as can happen with an empty domain with DD.
Also simplified the clearing thread count calculation.

Fixes #1990.

Change-Id: Idc3e42140ac73714475af0918febbf4cac8e43f7

Fix CUDA build with multiple compilation units

An unnecessary include slipped in with the 7139bb20 merge which caused
the (non-default) multiple compilation unit builds to fail. This change
removes the offending include.

Change-Id: I5671df3b64e880f2bed02366ad3a7302647c64dc

Handle constraint errors with EM

All energy minimizers could fail with random errors when constraining
produced NaN coordinates.
Steepest descents now rejects steps with a constraint error.
All other minimizer produce a fatal error with the suggestion to use
steepest descents first.

Fixes #1955.

Change-Id: Ie2f7ad4039634d3c5f2597171ec47d6a145c5fcb

Find python for docs build quietly

Machines that don't have a suitable python don't need
to emit that message upon every run of cmake.

Change-Id: I09025c4a55539ba8c16d0dca486fa5b46a739ba6

Disable static libcudart on OS X

Recent versions of CMake enable a static version of
libcudart by default, but this breaks builds at least
on the most recent version (10.11) of OS X, so we
disable it on this platform.

Change-Id: I2cc9c3dc600c1c72a461c482888f38d33a33eb7a

Merge branch release-5-1 into release-2016

Conflicts:
CMakeLists.txt

Adjacent unrelated changes, trivial resolution

cmake/gmxDetectSimd.cmake

Two unrelated fixes, managing the new GMX_STDLIB_LIBRARIES better, and
working around the fact that we can't use try_run with noisy
compilers. The new context invalidates the suggestion to use try_run
once we require CMake 2.8.11.

cmake/gmxSetBuildInformation.cmake

As above.

src/gromacs/commandline/filenm.h

Adjacent changes caused by introducing Doxygen

src/gromacs/fileio/filetypes.cpp

As above

src/gromacs/legacyheaders/force.h

Change to function signature to pass const char *fn.

src/gromacs/mdlib/forcerec.cpp

Minor clashes from reorganized include files.

TODO make_bonded_tables was refactored in release-5-1 to need
t_filenm, which reintroduces a dependency on
commandline/filenm.h. Decide what to do about this.

src/gromacs/mdlib/mdatoms.cpp

git was confused about the frozen-atom fix because of the indenting
change, but the merge is straightforward in terms of code logic.

src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda.cu

New Doxygen in release-2016 clashed with introducing support for CUDA
6.0/6.1.

src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda_kernel.cuh

As above

src/gromacs/tables/forcetable.cpp

Change to function signature to pass const char *fn.

src/programs/mdrun/tests/CMakeLists.txt

Adjacent new files introduceed in both branches

Change-Id: Iaaffacc186aa5ff67c83522d2c07b05afeec75b2

Re-add support for linking against external TinyXML-2

I always intended this support, to permit convenient packaging of
GROMACS by distributions, but it got lost from gerrit while rebasing
from patch set 4 to 5 of I6153136.

Fixes #1956

Change-Id: Ie76dc9e8c6116814439d813d5a9555c5bfb7bfc5

Updates to documentation

Moved TNG management code into its own file, and called from a normal
place, so that the minimum requiried TNG version can be automatically
documented.

Change-Id: I4223a6339d635311cbe013e21c757e4065580271

Added manual section about embedding proteins to membranes.

Very short paragraph referencing needed papers and linking to user guide for details.

Fixes #1932

Change-Id: I0511576f06be35f1727a22c0d1e5f6552f1ae06d

Fix data race in hwinfo with thread-MPI

Fixes #1983.

Change-Id: Ic44d2c1e595796132127364900d4d995379b3175

Add support for CUDA CC 6.0/6.1

This change adds build-system and kernel generator support for the
Pascal architectures announced so far (GP100: 6.0, GP104: 6.1) and
supported by the CUDA 8.0 compiler.

By default we now generate binary as well as PTX code for both sm_60 and
sm_61 and given the considerable differences between the two, we also
generate PTX for both virtual arch. For now we don't add CC 6.2 (GP102)
compilation support as know nothing about it.

On the kernel generation side, given the increased register file, for
CC 6.0 the "wider" 128 threads/block kernels are enabled, on 6.1 and
later the 64 threads/block remains.

Some macros that were incorrectly left behind by the adbada4 fix had to
be eliminated from the CUDA host code because these caused double
definitions.

Change-Id: I7f465651125fe135255ce5c84db644c62caeea6b

Check number of items read in mdp statements

Added checks for the number of items read in all
sscanf() statements processing data from the mdp
file.

Fixes #1945.

Change-Id: Iecb5e3c7018570fb3a299624ac936c26d03294eb

Write OpenCL build log prior to checking build status

If building an OpenCL kernel fails, we still want to let the user know
why the build process failed. Therefore the code has to write the build
log prior to throwing a build failure exception.

Change-Id: I2a8881895379da9ce4b13cf34788357347f1050c

Handle partially frozen and constrained atoms

Atoms frozen along some, but not all dimensions would still be moved
along all dimensions by constraints. Now such dimensions are frozen.
Note that the initial configuration might not obey the constraints,
which leads to conflicting demands of freezing and constraining.
Partially frozen atoms in the initial configuration will still be
constrained along all dimensions (but will be frozen during the run).

Fixes #1960.

Change-Id: Ic4d43a2840fabc084aec4237abf6a589eaa72f37

Fix compatibility with older CMake

LINK_LIBRARIES is introduced in 2.8.11. We don't test anywhere that
CMake hardware detection works.

Change-Id: Icb5812c3971219085b8e91634e4cd08451041118

Make more CMake variables advanced

This keeps the ccmake display down to things users are reasonably
likely to want to change.

Fixes #1764

Change-Id: Ia78387bb32f5d6c7f6b82e453b155fdf657a044a

Ignore stderr when detecting SIMD and build info

Cray and some other stupid compilers echo extra stuff to stderr when
compiling a normal source file. Unfortunately, the standard CMake
try_run cannot ignore stderr.

Refactored some try_run calls to try_compile + try_run. This
incidentally makes build-host detection run faster because the same
binary is not compiled multiple times. The results of the detections
are now set up so that an initial default value is constructed,
replaced if the compilation and run both succeed, before finally being
cached.

Noted some TODOs for future cleanup of the duplication in such code.

Fixes #1881.

Change-Id: Icb770872f853cff3ef60426d9980f463341eb1cf

Add env. var. to enable OpenCL caching

OpenCL binary caching is disabled by default (due to concurrency and
versions not being safely handled), but during development, being
able to avoid the kernel compilation overhead can speed up testing and
benchmarking tremendously. This change adds an environment variable
that can be used to manually trigger binary caching.

Change-Id: I5879acf040216ad75dbb54e7ecec001aae8af8a5

Fix OpenCL binary cache file name generation

The binary cache names was generated from the full path prefixed
with a constant string which is a not a valid filename.

Also added assertion on the file pointer passed to fclose.

Change-Id: I838cf9a2e19385afaf26b80a15be2b74973d3a4c

Fix ICC 16u3 build

The C++11 check was triggering a bug in a corner case (empty
defaulted move constructor), which wasn't present in the actual
code.

Change-Id: I650580497fe2ad9b67777d2f8a216cdb2c14bf12

Ensure IMD include dependencies work

Makes sure select() and the necessary struct types are declared.

Fixes #1978

Change-Id: I3f7e545c95f26b5cf0398ed14e6206c08ba9735d

Improve make_ndx help text

Clarify the use of boolean operators. The old help text could
incorrectly hint that AND, OR, and NOT would work as keywords.
Add a reference to gmx select that in most cases can serve as a
replacement.

Fixes #1976.

Change-Id: I0284c849c398e5b09569453d7d0f19b9639a6d0c

Fix reference to vmdio.c to be vmdio.cpp

Fixes #1979

Change-Id: I441ad6ace31325a6804f6031f32a1ca4600766e5

Second 2016 beta release

Change-Id: I6350da03d04580da4b1bd11dc66fa79dbef9a47e

Reposition nbnxn filler particles

As particles are now allowed to overlap in the nbnxn kernels,
the ugly, complicated scheme to create different coordinates for
the filler particles in nbnxn cells can be removed.

Change-Id: Ic25c3b4bc82065b2b6b25f721d34eda3e6c23c0e

Avoid numerical overflow with overlapping atoms

The verlet kernels did not allow overlapping atoms, even if they were
not interacting (in contrast to the group kernels). Fixed by clamping
the interaction distance so it can not become smaller than ~6e-4
in single and ~1e-18 in double, and when this number is later
multiplied by zero parameters it will not influence forces. The
clamping should never affect normal interactions; we would previously
crash for distances that were this small.
On Haswell, RF and PME kernels get 3% and 1% slower, respectively.
On CUDA, RF and PME kernels get 1% and 2% faster, respectively.

Fixes #1958.

Change-Id: I83b88f0e9ca34dc151a8b907f334a95a1a4301cc

Add check for finite energies

Added a check for finite total potential energy. This check is nearly
free and can catch issues with incorrectly set up systems before users
get a confusing constraint or PME error. Note that this check is only
performed at steps where energies are calculated, so it will often
not catch an exploding system.

Change-Id: If33245a96cecae78c9077a825cb22335c853a810

Add grompp check for unbound atoms

grompp now print a note for atoms that are not connected by
a potential or constraint to any other atom in the same moleculetype,
since this often means the user made a mistake.

Refs #1958.

Change-Id: Iabb00563c76a9f7954f84d89d1c67d438f2c31ff

Simplified and updated OpenCL compilation

Moved into gmx and new ocl namespace, updated variable naming, updated
string handling, treated many more error conditions, also with
exceptions, used more RAII, used more of the standard GROMACS
utility infrastructure.

Removed some string databases functions that existed merely to be
looked up once.

Changed to write OpenCL build log to file pointer provided by the
caller, if needed, rather than a separate file. This currently uses
stderr, so can't yet work well with multiple ranks, but neither did
the old approach. We need a proper MPI-aware logging module, first.

Separated the caching functionality into its own source file. Changed
the naming of binary cache to reflect the name of the kernel source
file whose binary is being cached. Noted further requirements if we
would re-activate caching at some point, but since it is still
de-activated, this is not worth further effort now.

Removed the requirement that we must be able to read source code, if
instead a binary cache is available.

Required that compileProgram compile kernels for the vendor of the
target device. This was always the behaviour, but there is no reason
to be able to select alternative things there.

Simplified the passing of preprocessor defines required by the caller
of compileProgram to the JIT compilation.

Removed use of GMX_OCL_FORCE_CPU in log file coordination, as CPU
OpenCL devices are not supported.

Refs #1720

Change-Id: I25e78526f55715c779819e96d6bf6b52ad9394c6

Fix build with clang 3.1+3.2

Change-Id: I8bffaa081d4cf52662e46fcd6c58667b4528fa04

Detect the usage of hwtop XML caching

The environment variable HWLOC_XMLFILE can be used
to speed up hwloc detection e.g. on Xeon Phi. Documented
in the user guide, and made sure we detect when such
caching was used to avoid strange future bug reports.

Fixes #1946.

Change-Id: Id99385fb7cc1e2692fb9b06fa187424058aaa213