BioD PNPI Git Repos - alexxy/gromacs.git/log

Fixed typo in description of conversion factors.

Change-Id: I5f9875ef94e8d5526f84c029f539fab642319a04

Fixed shift and switch modifiers, particularly for free-energy

When using tabulated interactions (historically with PME-Switch), the
previous free-energy kernels used tabulated interactions which gave
correct results. However, as we have moved to using the new
interaction modifiers, Ewald short-ranged interactions are computed
analytically. To extend the range over which we apply the soft-core
interaction, the free-energy kernels evaluated interactions by
subtracting the reciprocal-space component, and then applying the
free-energy evaluation to the Coulomb (1/r) short-range
interaction. This works fine for vanilla PME, but led to problems when
combined with a switch modifier, since we are switching a different
function compared to the non-free-energy kernels. This could lead to
large artefacts where the free energy was 100x off if we were applying
the cutoff to r while the switch was applied to the scaled soft-core
radius.

This patch modifies the free-energy kernel so that the vanilla, shift,
and exact-cutoff versions still use the compensation trick, while the
switch modifier always operates on the traditional short-range Ewald
functional form.

The (very small) Ewald shift has also been added when computing free
energy in combination with Ewald summation and potential-shift
modifiers. As the perturbation goes to zero, the interaction will also
approach the non-free-energy interactions. Tested to match the
non-free-energy kernel to with 1e-8 in the fully coupled state, it
conserves energy, and produces reasonable free energies for ethanol in
water.

This also modifies table-generation, table-usage, and
dispersion-correction code to use shift/switch forms (and correctly),
when that has been selected in the interaction modifiers. This
provides much more accurate results for our new shifted interactions.

Correct (unmodified) tables are now generated for 1-4 interactions
in a few corner cases in the presence of modifiers for non-bonded
interactions.

Code paths for using exact cutoffs now work correctly when
rcoulomb-switch != rvdw-switch, or if only one kind of switch is
active.

Free-energy calculations using a plain Coulomb interaction now
incorporate a potential shift if one exists.

The GMX_NB_GENERIC environment variable can now be used to specify the
use of the generic kernel even with shifts or switches active.

Fixes #1463.

Change-Id: Ia63a1ed7d6c9cdf9cd9e6209b6326a49043060ec

Fix constraint virial with multiple time stepping

With multiple time stepping the additional nstcalclr-1 force
contribution was constrained to remove it from the virial.
This procedure neglected the non-linear contribution due to
the rotation of constraints. Now the contribution of this force
component to the coordinate update is constrained instead and
the corresponding virial contribution is subtracted from the
constraint virial.

Fixes #1400

Change-Id: If3217f52808bf7491998324f8dc3161bc003ec1b

Updated C-/N-terminal partial charges in Amber03.ff.

At the time of porting the AmberFFs were validated against
AMBER 8 and the results have matched precisely. However,
that specific AMBER version had a bug due to which CT/NT charges in ff03
were in fact using ff94 charges. The bug correspondingly propagated
to the Gromacs ports. In newer versions of AMBER this has been fixed.

The current GROMACS patch uses charges as specified in the
all_aminoct03.lib and all_aminont03.lib files as
taken from the AmberTools14 distribution.

In that distribution (14) seem to be no updates to the ff9x parameters.

Fixes #1466.

Change-Id: Ie6cfea5702500ff6cd5019edb22f224d29135425

Fixed g_energy Einstein viscosity

The g_energy -vis Einstein viscosity output was obviously incorrect.
This bug has been introduced in version 4.5.
Fixes #1516

Change-Id: I5ffc1a232f0c64769cc438c977b757e8d8b55b98

Fix memory issue in solvate

Caused by SIMD padding introduced by new group kernels.

Fixes #1499

Change-Id: I5126217c9b752f1c1fd04d01e2644987fdc52d5b

Normal modes don't work currently with virtual sites or shells.

Refs #879.

Change-Id: I1c45b5a4b4c97feff222dccbbbb884e0153ad0c5

Add fatal errors for VV and twin-range MTS

Michael never implemented the multiple-time stepping with the VV
integrator family and constraints (see code that calls
combine_forces() from update_coords() in src/mdlib/update.c). Probably
that means the multiple-time-step regime was not tested with VV
either. Strictly speaking, these new fatal errors have scope that is
wider than is clearly warranted, but it is not clear the
no-constraints VV path was only ever as bad as the broken leap-frog
path is (see #1400).

I suspect VV+constraints will work with the incoming fixes for
leap-frog, but until someone wants to use it (and why would they?),
then I'm not going to test that it works as well as it does with
leap-frog.

Change-Id: Ib61d0fb7661bca2101c04423a6af1744420c06ab

Improvements to the g_lie help description.

Information was sparse, likely to avoid restating published
protocols, but some expansion was definitely warranted,
given the amount of questions that have recently been
posted to the mailing list and Redmine.

Refs #1353

Change-Id: Ib4057ae671ccac70061498b1d40ebcba84c497ee

Added a note about using direction-periodic pulling.

Refs #1352.

Change-Id: I867fe1372082063bb221880e8021089f9ded14e7

Added a note about unsupported Verlet cutoff + Buckingham.

Refs #1192.

Change-Id: I2da3bf8c768de40531e1ffdf6f8cec73c9e53314

Fixes a complicated bug in g_anaeig.

If the number of frames for a covariance analysis is fewer than the
number of degrees of freedom, g_covar would happily print nonsense
eigenvalues in the eigenval.xvg file. This would then lead g_anaeig
to give NaN entropy values. By limiting the number of output lines in
the eigenvalue file this should be resolved. In addition a warning is
printed.

Change-Id: I01693a0fa9f3ba5b5784543a04d0d88b33a755c2

Prohibit AVX_256 with buggy gcc 4.6.1

Fixes #1259

Change-Id: I7c523a90ccd0dd8fd3df6cb1914429e22a27ab5d

Fix clang 3.5 warnings regarding *abs*

Fix abs() family functions type usage mismatch.

Change-Id: I85ed2931d681aa1ad024678b4209a524abc2cc61
Signed-off-by: Alexey Shvetsov <alexxy@omrb.pnpi.spb.ru>

cmake: enable shared libraries by default on Hurd

Since the toolchain is mostly the same as used on e.g. GNU/Linux
(glibc, gcc, binutils, etc), they can be safely enabled.

Change-Id: Ic663c8c344b5dfa2910e9c30260588e1917038c1

Removed truncation of nrdf in v-rescale thermostat

The resampling function for the v-rescale thermostat expected an
integer value for nrdf, but a real was passed, which was truncated.
With a single coupling coupling group nrdf is analytically an int,
but could be off by a bit. The could lead to incorrect kinetic
energy fluctuations (averages were correct).
Now fractional nrdf's are properly handled for nrdf > 3.
For nrdf < 3 a check is added for integer values with a small margin
for rounding.
Fixes #1218

Change-Id: I4c60c337f9874d0bff51220ad09429140be2a056

Fix memory error in asc format IO

Found with cppcheck 1.64

Change-Id: I72e80cd2b11559be47d449d0a8d444857843979b

Issue warnings for potential switch w/o defining switch regions.

If either older style switches or new potential-switch modifiers are
used, issue a warning if the PME regions is too long, resulting in
inaccurate energies. Also issue a warning if rvdw_switch is 0, which
occurs if no value is specified.

Discussion is in redmine #1463

Change-Id: I7a2c29f87ceb04712aab5076ac97f5f22f573671

Fixed g_rmsdist NOE calculation

This fix solve a series of bugs in NOE calculations:
1) incorrect number of frames in the calculation of the average distance
2) wrong selection of equivalent atoms
3) -equiv was wrongly documented

Change-Id: Icc6bbed0e1ba65774ad470c3edbd7a6a96e63ee6

Fix invalid dereference in g_tune_pme

Change-Id: If4c999e22799ab2087ff1a1d74d8e5b50ddddd57

Make sure water optimization is disabled for esoteric interactions

This fixes a bug where the generic NB kernel could be called with
a water-water neighborlist for a few special combinations where
no C kernels existed (e.g. switch/shifted plain-cutoff coulomb).
This would typically lead to virtually no nonbonded interactions
being calculated and the simulation crashing rapidly, so it is
unlikely to have affected any results silently, but was noticed
when testing interaction forms.

Change-Id: I634fc4ab78b54281c89333299975e25883dc1f2c

Adds cut-off checks for triclinic domain decomposition

With domain decomposition and 2 decomposition cells in a trilinic
dimension, the cut-off could be longer than the size of the
communicated domains. This could lead to some pairs close to cut-off
distance to be ignored in the force/energy calculations.
Fixes #1467

Change-Id: Id7e16d7f8fa0796d6adcf48ad6e8bbb0b88039ff

Fixed bug in parallel v/f constraining

Constraining v or f with 3 or more decomposition domains in one
or more dimensions could lead to modification of communicated
v and f components by the box size for inter charge-group constraints.
Fixes #1462

Change-Id: Idece9d2d0d8f48e65a654d5c2892fbe1ff836ba0

House-keeping for MTTK

Leapfrog + MTTK silently produced a .tpr that is probably broken;
certainly the documentation only states support for Velocity-verlet
integrators.

Added note about deprecation and planned removal of MTTK + constraints.

Refs #1292

Change-Id: Iec2cf0dd866242735ce04a954e585b2461f6e701

Avoid cross product with zero vector in rotational pulling.

Fixes #1431 (rotation/flex-t regression test failing on BG/Q)

In do_flex_lowlevel() we checked (by mistake!) for xj-xcn being
zero, although we need to check for yj0-ycn being zero, since
we use yj0-ycn in a cross product in the following lines of code.
I now also replaced the direct check (0 == norm(...)) by checking
what gmx_numzero(norm(...)) returns. The latter replacement
was also applied in the do_flex2_lowlevel() routine. Note that there
the check for small xj-xcn was and is actually correct.

Change-Id: I972b6d67a81e30f297db286cd2224f66753a20aa

The van der Waals radius must equal the Coulomb radius with Verlet

This patch fixes an issue that can occur when using g_tune_pme
with a Verlet pair-list .tpr input file. If the .tpr itself has
large cutoffs (e.g. 1.2 nm) and one asks g_tune_pme to scale
_down_ the Coulomb radius, the van der Waals radius is not
scaled down with the Coulomb radius (only upscaling worked).
One ends up with an unusable .tpr file because rVdW != rCoul.
This patch ensures that van der Waals and Coulomb radii
are always equal with Verlet pair-lists. Fixes #1460. Thanks
to Joao Rodrigues for reporting the issue!

Change-Id: I5ef30e71a35cd83838040057e16e52e09ea82e9a

Fix incorrect grid cell size in g_sas -nopbc

Fixes #1445

Change-Id: I798fc8fe96608633f26d9a3500f83f39b44af008

Add fatal error for Andersen+constraints+DD

This combination produced a temperature that was 6.5 degrees higher
than the same with one domain, for a 1ns PME lysozyme in water
simulation.

Change-Id: I9f80276c47de955a5053bcabb6fe7c9bfdceaf0e

Improved CUDA non-bonded kernel performance

Some old tweak which was supposed to improve performance had in fact
the opposite effect. Removing this tweak and with it eliminating
shared memory bank conflicts it caused improved performance by up
to 2.5% in the force-only CUDA kernel.

Change-Id: I7fcb24defed2c68627457522c39805afc83b3276

Permit warning-free use of Andersen thermostat

Simulations using Andersen thermostat don't need to remove center of
mass motion, because that is intrinsic to the algorithm. Using it with
nstcomm > 1 generated a warning that center-of-mass removal is
unnecessary. Using it with nstcomm == 0 generated a warning that
ice-cube artifacts might be generated. Resolved by suppressing the
latter warning for both Andersen thermostats.

Change-Id: I6b2c1594cabd81964b3ba1ebc5dd61e0f1debb5e

portability aspects in install guide + minor tweaks

Added information on portability aspects related to CPU instruction
sets, related to #1428.

Additionally, made several minor updates and tweaks related to
compilers, platforms, cmake, etc.

Change-Id: I621262c939c119e5bdd5e7c91dda0ae3ffc60b7b

Reinstate shell code with DD

Further work on the complex/sw test case in the 5.0 regressiontests
branch reveals that the initial conditions may have been the reason
for the problems observed with DD and more than one node, rather than
the implementation.

Refs #1429

Change-Id: I26ff6d9f8c79605afa794cae4761b5643b712124

added gmx_is_{single,double}_precision

* allows easy detection of the precision for cmake, autotools
  without parsing the output of gmx_print_version_info (no
  cross-compile support) or the output of strings command (unix
  only)
* linking against libgmx with/out -DGMX_DOUBLE will lead to
  unpredictable segfaults

Change-Id: I472f10ae374a1f42c94c55e156b53f8905bdf098

Fix aligned store to unaligned memory

Also fixes that unaligned store was used when not necessary.

Change-Id: I44bb222a07ec0af65198667787b8673b3c6cd2e7

avoid mdrun crash when rdtscp is not supported

When using rdtscp, mdrun now detects at runtime whether the CPU supports
this instruction and if this is not the case, it issues a fatal error
and instructs the user to recompile mdrun for the compute host. Note
that this will happen rarely, only when cross-compiling from a newer
host for a rather old one.

Additionally, when the user manually picks AVX, we also turn on RDTSCP
as all AVX-capable CPUs support it.

Also made CMake advanced cache option for GMX_USE_RDTSCP. This replaces
the previously hidden GMX_DISTRIBUTABLE_BUILD option.

Fixes #1428

Change-Id: I8bc884ef9ea8ea4661626b60490182ae2b302648

Added safety check for fitting group in anaeig.

Previously, g_anaeig would not check the number of atoms in the selected
fit group against the number of atoms in the reference structure if this
was read from the eigenvector file (g_covar adds the reference structure
to the eigenvector file if fit and analysis group are identitcal). As a
result, anaeig would run out of bounds when selecting the atoms for
fitting, reading random values from memory.

This simple check should prevent this behaviour by terminating anaeig
with a fatal error similar to the one that is invoked if the group
selected for analysis has an incorrect number of atoms.

Change-Id: I63a1e1629144e539808d95d867e0ad0673480fdf

Issue fatal errors rather than use broken shell code

Refs #1429

Change-Id: I18a17f1e232a86a13f4e3b591bd992702af3017b

cmake eats slashes

Change-Id: I5ea157c4a5e9df2212643b49ba9b270ffd9a6978

Keep clang Address Sanitizer happy

Allocating 15 bytes with the 8-byte aligned memory at offset 8 of
15, would overflow the buffer, which would be fairly likely to
have no effect. But ASan notices this if you run it on AVX hardware,
unlike the Jenkins build which runs on SSE4.1. The good news is
that this fix is enough to make all the existing tests pass under
ASan on AVX.

Change-Id: I61ff11687709e096c70a162d3514227cb243561d

cmake: added FFTW_URL to allow easy offline build

Change-Id: I9904ce03e0ee1b377e4961c1f8481fc98c10cba4

Remove unused fplog

Keeps gcc 4.8 build happy

Change-Id: I392b02c0950ead04c414dffd6340b364b804b7aa

Unify logic for timing counts

Made all integrators use the same logic for starting timing
mechanisms.

Change-Id: Id8cb154f7b96d977efffcc9533d4a6dd9894afbd

clarified OpenMP-related things in mdrun help/man

Added note on OMP_NUM_THREADS/GMX_PME_NUM_THREADS env vars and
improved description on the use-cases when MPI+OpenMP improves
performance.

Change-Id: I904f00c8a4b6907a006b9d4367406d3fa3f3ce42

Fixed precision in thermal expansion coefficient calc.

Loss of accuracy was caused by different sampling
of volume and enthalpy and as a result alpha was
computed incorrectly. With the present "fix" the volume
and enthalpy are both downsampled to what is written
in the .edr file. The real fix would be to store the product
of H and V in the .edr file, but that falls outside the
4.6 branch policy.

Change-Id: I1be06d689002d7c9d6be92bf1e377912f0be1efd

Checkpointing fix for Native Client

Native Client doesn't allow file renames.  We can over-write output
files, however.  For checkpoints, live dangerously and skip backups.
The alternate would be to use an in-memory file system, but then we're
still screwed if the program gets killed partway through writing the
on-disk version.  Other alternates:  keep all checkpoints.

Change-Id: I952ee6436e69f015633a150f94fca65c7271c6bb

Patch for Native Client builds.

This patch contains the source changes necessary to compile Gromacs
for Native Client. Patch is based on original work by Ivan Krasin,
additional changes from Joseph Coffland.
Also included are a few compiler warning fixes and a minor FAHCORE
tweak.

Change-Id: I085c52ff1d8e45ec8ffb8c56f5877313d6225bb2

cmake: make GMX_BUILD_OWN_FFTW work without fortran compiler

Fixes #1412

Change-Id: I4739c112630ad7e264ce314d2da0b29932ea3041

Pass on default value of radstep in make_edi

This is a bugfix for make_edi when -radfix is chosen. Problem was:
if the user did not specify -radstep, then the default value of 0
was not written to the sam.edi output file. Now it is. Also
renamed "radfix" variable to "radstep" because that better reflects
what it is.

Change-Id: I0cc6ee84d42b18ee0ea6b045cdfb0c1d55d51b9f

Essential dynamics: move bNeedDoEdsam evaluation to separate function

The bool variable edi->bNeedDoEdsam is used to signal whether any essential
dynamics constraints have to be evaluated for the ED group. This variable
was evaluated at the beginning of an ED simulation in write_edo_legend().
The latter is however not called if continuing from checkpoint. To
get rid of having to remember whether edi->bNeedDoEdsam was already
initialized, there is now a function bNeedDoEdsam(edi) that is evaluated
every time when called.
Also corrected a few typos.

Change-Id: Iab899a677a85ee8270354859c98cc9e5a9db34b7

Fixed essential dynamics (ED) continuation from .cpt for reference=average

ED runs where the reference and average structure indices are the same can
crash when continued from checkpoint. For these cases (where reference =
average atom indices) obviously the set of atomic positions for the reference
and average structures is always identical. Therefore, only one of the two
structures is stored (which is the average structure edi->sav). When reading
the old values of these structures from the checkpoint file, edi->sav.x_old
therefore needs to be copied both to xstart and xfit in init_edsam().

Change-Id: Ieb1f029f4a927999dfb4579ee7c3bebe15071dc8

Fixed compilation issue due to gcc4.8

by turning off warnings.

Change-Id: Ice3dd8dec8cb9dc590fb293c1face3ed603f7abb

Fix non-critical typo in #ifdef GMX_OMPEN_MP

Added #include to make it work.

Change-Id: Icea244c4fb63aee6ae67a29370d08177a66129a8

Version bump after 4.6.5 release

Change-Id: I1d1c1ee28d585b6cf4431f9f1ec1a334f68ae6e3

Version bumps before release

Change-Id: I5b5ea233c47ce95474dae3b0f71a4ae6ae704f6c

Fixed return value of gmx_mtop_bondeds_free_energy

The return value was always true, which was harmless, since it
could only cause a small performance hit of useless sorting.

Fixes #1387

Change-Id: I088a3747ddb3517fbb5e416b791bd542bd49fed2

Fix DD load balancing bug with GPU sharing

The recent DD load balancing fix which solved the issue of incorrect
imbalance measure with GPU sharing (ba8232e9) addressed GPUs with
incorrect indexing. This caused out of bounds indexing in the GPU ID
query function. The query function also had a bug in the error checking
which allowed the incorrect indexing.
Now also mdrun -nb cpu -gpu_id ... is allowed, which before would give
a fatal error.

This commit addresses both issues; fixes #1385

Change-Id: I2800f610b873da92afe78bbfd869258f378ba2d7

Fix incorrect variable name in documentation

Change-Id: I312e3886ebc692f2331ac2f9a612d530b5d4914c

corrected potential nbnxn SIMD memory issue

A fixed size array on the stack was declared with one element
too few. Probably this never caused trouble with 64-bit builds,
but it might have caused trouble with 32-bit builds.

Change-Id: I4dad0a7a9e80f5d27ac6ee7e4383082db654481a

Bump version after 4.6.4 release

Change-Id: Ied0463a471657e39cb6c4c41d6112f5778ef00d5

Fix minor things before release

Bumped various version numbers

Trivial fix to install guide

Removed out-of-date gmxfaq.html and links to it, replaced links with
links to up-to-date FAQ

share/html/online.html is generated by mkhtml, so stopped caching it
in the repo.

Change-Id: I52265e1174f6e42a2a9d056c3a1751c1cd5886ac

Clarified GPU selection output and mdrun help

The reporting of GPU selection has been confusing when devices are
shared as GPU IDs would show up multiple times in a list of "devices
selected to be used." The reporting has been modified to print the
number of devices selected followed by the GPU to PP rank mapping which
is in fact exactly the previously printed list of IDs.

Additionally, the mdrun help page now explicitly states that the GPU ID
string passed with -gpu_id specifies a per-node GPU to PP rank mapping
and that multiple ranks can share GPUs.

Change-Id: Id98c592c1dd38573df003247281e4edf50debba7

Added rotation to the tests run by ctest

Was forgotten when the rotation tests were added and thus
they weren't run by Jenkins

Change-Id: I27fc51b1314e6377d1e866a8ba4658700cc71cfa

fix nbnxn atom sorting with distant bondeds

Atoms communicated for bonded interactions can be beyond the non-local
search grid. Only a single cell extra was accounted for, which could
give inconsistency errors. Now any distance is handled correctly.
Fixes #1379

Change-Id: I7b12efeeab4074f2b356c0d0739105ce38371901

corrected dynamic load balancing when sharing GPUs

When sharing GPUs over MPI ranks, the time the GPU is busy might not
reflect the actual load. To make the dynamic load balancing between
domains work correctly, the GPU wait times are now redistributed over
the ranks/domains sharing a GPU.

Change-Id: Id9414e3ef7cc5a73a2b4560a0e10c2ee8ab1257f

enable GPU sharing among tMPI ranks

It turns out that the only issue preventing sharing GPUs among thread-MPI
threads was that when the thread arriving to free_gpu() first destroys
the context, it is highly likely that the other thread(s) sharing a GPU
with this are still freeing their resources - operation which fails as
soon as the context is destroyed by the "fast" thread.

Simply placing a barrier between the GPU resource freeing and context
destruction solves the issue. However, there is still a very unlikely
concurrency hazard after CUDA texture reference updates (non-bonded
parameter table and coulomb force table initialization). To be on the
safe side, with tMPI a barrier is placed after these operations.

Change-Id: Iac7a39f841ca31a32ab979ee0012cfc18a811d76

GPU detection is done once per physical node

Only one MPI rank in each physical node now run the GPU detection.
The resulting information is broadcasted to the other ranks.
Note that we should also implement this for the CPU detection.
Fixes #1358

Change-Id: I16c6ccc40bd53d96b99d3f6a0abed69cc89136d8

removed (harmless) left-over in nbnxn SIMD kernels

This improves performance of PME + p-coupling by about 5%.
With Ewald and virial, the nbnxn SIMD energy kernels were used
(some left-over development code). The plain-C code did not do this.

Change-Id: I039044fcb393bf0bcaa06f38498b2a57d60cf080

reorganized GPU detection and selection

The GPU selection has been separated from the GPU detection
and now happens after the thread-MPI threads are started.
The GPU user/auto-selected options have been removed from
gmx_hw_info_t, such that it only contains hardware info
and can be passed around as const.
As both the CPU and GPU options structs are now tMPI rank local,
tMPI thread concurrency issues are avoided.
Fixes #1334 #1359

The GPU detection is now skipped with mdrun -nb cpu
CPU acceleration binary/hardware mismatch is now only printed once
to stderr (instead of #MPI-rank times to stdout).
Removed the master_inf_t struct.

Change-Id: If497f611b911808f6d01ca83f41ae288061dd361

Rename GMX_IS_* to GMX_TARGET_*

This addresses some confusion that developed between release-4-6 and
master with me trying to develop the kernels in master branch so I
could have unit testing support, and then cherry-pick them back. I
had intended to solve #1269 in a separate commit, but it didn't happen
that way.

As #1269 discusses, the code sometimes needs to know what architecture
is being targetted by the compiler. This information is held in the
GMX_TARGET_X86 and GMX_TARGET_BGQ CMake and preprocessor
variables. Note that this information is distinct from what CPU
acceleration is being used (which might be "None" on either platform).

gmx_cpuid.c needs GMX_TARGET_X86 defined to work correctly on x86, and
is called at configure time (at which time config.h is
unavailable). So, this in CMake is treated via a command-line
definition of GMX_TARGET_X86 when required.

Fixes #1269 (even though I98c5791ec silently did this already)

Change-Id: I94e0756856e7d49ff09a87b8283189976b48ea49

corrected volume with serial NPT replica exchange

Replica exchange with replicas run in serial would only update
x and v, not the other state data. This gave incorrect volumes
with NPT replica exchange.
Fixes #1362

Change-Id: Ib726fbb75e800c624ef61f31e76a5d4a4e408b9c

improved the nbnxn buffer size estimate with GPUs

The nbnxn Verlet buffer estimate now takes into account that
constrained atoms rotate, and don't move linearly, around the atom
they are constrained to. This significantly lower the buffer size
estimate for long neighborlist life times (as used with GPUs).
The buffer for most CPU runs is not affected (significantly).
Because of the smaller buffer, mdrun now uses smaller list increase
limits for increasing nstlist when using GPUs. This improves
performance.

Also activated and tested the virtual site effective mass calculation
(vsites were ignored in the drift calculation).

Change-Id: I2cb349f483610eabcc97bfbc23d17f189dec19d6

Fix NBNxN SIMD reference kernels

nbfp_stride was added independently by both 25eb0e14 and 5deee8a0.

Removing static is not OK for gcc. Mark will resolve later whether
this was even needed for his upstream work.

Change-Id: I97ea4131163512354b5e339dd19549c3e49e9de2

fixed recent bug with CUDA texture objects

On GPUs with CUDA architecture 3.0, mdrun would exit with an error.
This bug was introduced very recently in 43b41cb8
Fixes #1361

Change-Id: I0c46867b987cbf3c0da3aa9384d985fef1e4aa73

fixed OpenMP threads being pinned to the same cores

Due to the thread id not being a thread-local variable in the OpenMP
loop setting the thread affinities, different OpenMP threads could be
pinned to the same physical cores.
Fixes #1360

Change-Id: I7bc39aef9a8854ec24006895da6005c1326033a3

BlueGene/Q Verlet cut-off scheme kernels

The kernels are implemented with small functions whose inlining
is guaranteed by the use of xlc and clang extensions. That's a hack
whose general solution I plan to implement in master branch.

Other BG/Q considerations:

Architecture detection now works on A2 core.

Install guide updated.

It is better to use intra-node communicators than not, and ranks
within nodes are correctly detected via querying the BlueGene/Q API,
since the hostname is not useful for the purpose.

It is better to not set GMX_DD_SENDRECV2.

It is better to use the analytical Ewald correction.

In principle, we should version the type of variables and fields named
d2, rl2, rbb2 in nbnxn_search*[ch] to be double on PowerPC and float
everywhere else (each regardless of GROMACS target precision). This
would mean that on PowerPC (where all flops take place in double
precision with free precision-extension upon load) we can be both
cache-efficient by storing bounding boxes in float, and flop-efficient
by not having to generate a round-to-single instruction to compare the
result of subc_bb_dist2_simd4 with the cut-off stored as a
float. Still, a flop per bounding-box distance comparison will not
break the bank.

Enough bgclang support exists for the build to succeed (no platform
file is required), even with OpenMP, but a number of compiler issues
have been reported on llvm-bgq-discuss mailing list.

Change-Id: I98c5791ec3766cdbdcb8a8eb7418d00585727cc0

Call atomics from TestAtomic.c

This exposes more compile-time errors than simply parsing
the definitions. This makes CMake's diagnostics more useful
with respect to atomic operations.

Fixes #1355

Change-Id: Ie1d6f14565700b98988cadc17cb7ac2b78d76ce3

Fix tMPI_Atomic_memory_barrier for MIC

MIC doesn't has sfence. It isn't required because the current generation
of MIC is in-order.

Change-Id: I6953bc3168a191a3038408e6ea35025a25509abe

Fix typo in g_membed documentation

Suggested by Iman Pouya

Change-Id: I5c77a29b64e61f9da5a663119e149d992141eb21

Fix SIMD C reference nbnxn kernels

Got broken by ace006a86 and 022581b388.
An additional fix for nbnxn 4x8 reference code, broken by c0cf8ce,
is in a separate patch.
Also changed the AVX256 double precision nbfp_stride from 4 to 2.

Refs #1173

Change-Id: If3b3291a7ff765acc19c29f834e856cc9798d47e

Restarting from checkpoint no longer reinitializes WL weights.

Fixes a problem where mdrun was reinitializing the initial Wang-Landau
delta for expanded ensemble simulations, because the flag turning it
off was stored the expanded ensemble data structure (not saved in cpt)
instead of the df_history structure (is saved in checkpoint). In the
process, some moderate encasulaton of the df_history structure and
the expanded ensemble methods.

Fixes #1350

Change-Id: I13492a7a9773fcb417fcd0ee106d851d9838ce25

avoid division by zero in SIMD angles and dihedrals

The SIMD accelerated angle and dihedral code did not (correctly)
check for dividing by zero, which can happen with aligned bonds.
Fixes #1351

Change-Id: I326f90fca87ab5cca493204de4a58655465634ca

turning off expanded ensemble for all integrators but md-vv.

Broke at some point, and somewhat tricky to turn back on
correctly for other integrators at this point; target for
5.0 when it should be more straightforward.

fixes #1321

Change-Id: I599b308800411e0cea111ffd280487037d613755

fixed a half bin misalignment in gmx_vanhove -or

Change-Id: Ia800861912d50f5047742bcb1bb51e753920968f

Fixes a problem with pair type 2 interactions with free energy

Pair type 2 interactions, which should remain on regardless
of couple-intramol=yes, were being turned off. Currently, when free
energies were turned on, they were just ignored, because the (empty)
pair one 1 type list was copied over them. This fix adresses
this problem by adding onto the list instead of copying it over.

Fixes #1315

Change-Id: I240479a8dc083f7a355917ed9f74f4337fa3448f

make use of CUDA stream priorities

CUDA 5.5 introduced steam priorities with 2 levels. We make use of this
feature by launching the non-local non-bonded kernel in a high priority
stream. As a consequence, the non-local kernel will preempt the local
one and finish first. This will improve performance in multi-node runs
by reducing the possibility of late arrival of non-local forces.

Change-Id: I4efc65546e4135f12006c0422e1fca42a788129f

use CUDA texture objects when supported

CUDA texture objects are more efficient than texture references, their
use reduces the kernel launch overhead by up to 20%. The kernel
performance is not affected.

Change-Id: Ifa7c148eb2eea8e33ed0b2f1d8ef092d59ba768e

introduced general 4-wide SIMD support

PME spread+gather and the nbnxn search bounding box checks use
4-wide SIMD (as opposed to arbitrary width SIMD). This SSE code
has now been replaced by macros from gmx_simd4_macros.h.
pme_sse_single.h has been renamed to pme_simd4.h
This change is mainly refactoring; it only adds PME spread+gather
AVX acceleration in double precision plus a few FMA instructions.

Change-Id: Ia5e02295bb281a2e23d57f4c165f555de6744064

fixed nbnxn 4x8 pair search without AVX

This bug was introduced recently.
Note that nbnxn 4x8 without AVX was only possible when manually
changing the code to use plain-C reference SIMD.

Change-Id: I5effe4076bc5ff270ebeb366f9c2b8a13c256025

Fix parallel build for GMX_BUILD_OWN_FFTW

* only works for cmake >=2.8.8
* cmake 2.8.7 has a bug in add_library, but
cmake 2.8.[0123] have other problem, cmake-2.8.[456]
still don't build in parallel
* fix from https://gerrit.gromacs.org/#/c/1675/12
* hardcode libdir to fix build on OpenSuse

Change-Id: I74315880f71fd4384084819ccc686072f7cad4f5

Fixes reaction field free energy bug

Version 4.6 introduced an error in the reaction-field correction
force term for perturbed interactions. A factor r_softcore^2 was
missing. The force calculation code is now slightly reorganized
and comments added to avoid such issues in the future.
Fixes #1318

Change-Id: I9105139f8975495c323008ce202cde517a69281a

Silence clang warnings

Pre-release clang 3.4 warns that the types of lout[23] variables
is not the real * expected with GMX_MPI_REAL.

Change-Id: Id3ca4567f5eb642ead0cb4ce8d48dafbb92c303a

allow compilation to optimize for CUDA compute cap. 3.5

Enabling optimizations targeting compute capability 3.5 devices
(GK110) slightly improves performance of both PME and RF kernels.
This requires a hint for the compiler optimization indicating
the maximum number of threads/block and minimum number of
blocks/multiprocessor. This change allows nvcc >=5.0 to generate
code for CC 3.5 devices and switches to including PTX 3.5 code
(instead of 3.0) in the binary.

Change-Id: If7e14d31165bc05859250db7468bf6bd8c186264

Corrected info text. -center --> -boxcenter

Change-Id: I99901047fcde55f9714c81d3182a3778f290ebac

Fixed limitations in g_cluster

Old version produced wrong output for large trajectories with more
than 46340 frames. The reason was that the number of RMSD matrix
entries which is the square number of frames was stored as int which
caused a MAX_INT overflow. By changing it to gmx_large_int_t, g_cluster
is now able to handle trajectories with up to 3e9 frames.

Also freed leaking temporary buffers.

Change-Id: I8acfb0cedae9ddde207f39cb627ad2ea9fbbb9e6

logic fix for free energies with mdrun -rerun

It was taking the wrong logical path when it checks whether
delta_lambda = 0 when doing mdrun -rerun

fixes #1330

Change-Id: I3dadbb546b4376fae72c1b00c0684450bf77396f

Drop md5sum check for GMX_BUILD_OWN_FFTW

The old version gave a confusing error message about a wrong md5sum if
the download failed.

The new version no longer checks an md5sum at all, which avoids the
need to test a CMake version. It also gives an explicit warning and
instructions on how to proceed safely.

CMake bug reported at http://www.cmake.org/Bug/view.php?id=14330
Noted TODO to revisit if that bug gets fixed.

Noted TODO in master branch to show this warning only the first time a
suitable cached variable is set.

Change-Id: I403896505b178251087d71f95362c3754cd4a2de

Fix bug in (long) neighborlist SIMD padding when adding to previous list

Gromacs-4.6 introduced SIMD padding in the neighborlists, which works
fine for normal simulations. However, when the neighborlist gets long
and we end up adding a second batch of particles we need to remove the
previous padding, which was not done until now. This will typically only
occur when the list per node is large, e.g. when using long cutoffs
(>2nm) with only a single core. Normal simulations should not have been
affected by it (which is also why we did not find it until now).

Fixes #1341.

Change-Id: Ie64ab6c0313a8dc0d3545a5e7d610f24adae4438

optimized generic SIMD invsqrt

The function gmx_invsqrt_pr now uses one instruction less when
FMA is not supported in hardware.
Fixes #1333

Change-Id: Idace7296b88a8ecc0331e22d5bb3088753c478de

fixed multiple distance restraints with OpenMP

Distance restraints with multiple pairs (the same label) are no
longer split over multiple OpenMP threads. Some (beneficial)
reorganization of the bonded thread division was required to do this,
most importantly: removed calc_one_bond_foreign.
Fixes #1316

Change-Id: I88d8eafede5cbc26c19026a9272639e652f7abd7

Described another way g_tune_pme might reasonably fail

Change-Id: Ibb75f40a17b81934ae768a57d5e4fb11d07cdc2d