BioD PNPI Git Repos - alexxy/gromacs.git/log

Add dynamic pair-list pruning framework

The change add the logistic for setting up dynamic pruning of the
nbnxn pair-lists. Dynamic pruning allows for an increase in nstlist
and rlist while computing fewer pair interactions in the non-bonded
kernel. This comes at the cost of running an extra pruning kernel
every few steps and added communication due to larger buffers.

The kernels, documentation and heuristic for choosing nstlist
will be added in separate changes.

Change-Id: Id8040b95f812df60f117279267bf551ff4ac8d79

Clean up nbnxn cluster pair addition code

Partial clean up of the nbnxn CPU makeClusterList functions.
Clarified variable names and added more documentation.
No functional changes, except for the removal of setting the start
of the column range the diagonal element, which lead to more checks
but did not affect the resulting cell range.
This is prepration for a bug fix.

Change-Id: Ib0a32087d205a23ebef85394d226f084ba515c24

Parse user-supplied GPU task assignment only when needed

Now there is never both a gpuTaskAssignment and a
userGpuTaskAssignment present, which could have been misused or
misunderstood.

Also fixed somewhat related outdataed naming and documentation
for gmx_count_gpu_dev_unique().

Change-Id: I09d27857b6a91a49735850655bdf26af4bd88445

Remove duplicate GPU non-usage hint

5bef33832c removed the behaviour where mdrun would hint to the user
that compatible GPUs were found, even though the simulation would not
use them (for various possible reasons). That was unintended and
wrong, however, that report was already duplicated for both rerun with
energy groups, and the group scheme, by logic called earlier from
runner(). So we can just remove this hint.

Minor cleanup to the way local variables are created and used.

Change-Id: I7a5f61c3384dba5eec9f049169f3c32f3d00f4f8

Moved logging functionality into GPU usage report

This moves the logic about whether a GPU usage report is written into
one place.

Noted TODO to resolve bug created by the formerly split logic.

Change-Id: Ieef7853994b4a84a2e45a06598fe852e8f7ac6f9

Add OpenMP for orientation restraints

Recent commit 0be497b7 disabled OpenMP for orientation restraints.
Now the restraint indexing issue with OpenMP is resolved by copying
the type numbering solution from the distance restraint code.

Change-Id: Id3af147c940b84afd542a26736235ece94101c7a

Use graph with orientation restraints

With the Verlet cut-off scheme by default molecules are not made whole.
Now they are made whole when orientation restraints are used.
Added checks and assertions for correct PBC treatment with orientation
restraints.

Fixes #2228.

Change-Id: Ib33294cb9b0b0d131b0c385c001b7cb73c006ba9

Remove loop over all atoms in init_mdatoms

We should avoid non-parallel loops over all atoms in the system.

Change-Id: I7052f7f203e7d191377ab939e26f9dcd713a9998

Clarify data flow in GPU task assignment

Previously, hw_opt.gpu_opt->dev_use could be filled with data at two
widely separated points, which was harder to understand and maintain
than this alternative.

The possible user-supplied task assignment is now copied into the
vector for task assignment that is later used by mdrunner to configure
the modules that do the work. The automated assignment fills the same
data structure, but the semantics are improved by using a vector, and
copying the user assignment is now trivial.

userGpuTaskAssignment is now part of hw_opt like other user options,
and stored in a string. Eventually the ordinary-data type hw_opt will
be used in a const-like manner, so this changes helps with that.
gmx_hw_opt_t now cannot be allocated with snew, but there is no longer
any reason to consider doing so.

Together, these eliminate gmx_gpu_opt_t because it is now merely a
vector (for now).

The parsing of the string from mdrun -gpu_id is simplified now that
its result is stored in a vector (and tests added). This aspect is
likely to get more complicated when a user assignment might look
different, e.g. "01:2,3:45" if/when we have two ranks each doing a two
PP and one PME task on different GPUs. Moved the parsing code from
detecthardware.cpp to hardwareassign.cpp, where it makes a bit more
sense.

Related trivial cleanup in gmx tune_pme, that uses the same parsing
machinery for GPU IIs, so that it can manage suitably.

Removed getCompatibleGpus() from header file, since its usage is now
local to hardwareassign.cpp. Similary, anyGpuIdIsRepeated() is no
longer used.

Change-Id: I1071281d3348c9cce05e7ac39a24775611ccc0dd

Use table Ewald for Skylake

Change-Id: I433acabb9269594465259d94eb01d34c313996dc

Fix include dependency for getenv

Formally, this is required for calling getenv to work, but somehow
Jenkins passes it and my macports clang doesn't. That's presumably
related to some transitive include of stdlib.h, but in any case we
should do this robustly.

Change-Id: I27ac743112e03feb14c49e14ead04302b084cf58

Simplified handling of GPU device IDs

Removed gmx_gpu_device_id lookup function, whose indirect lookup of
the device ID was more complicated than we need. We already require
the user to select GPUs based on the IDs as understood by CUDA, so for
simplicity, and without loss of generality, we can do that everywhere
in the code.

Removed a previously unused function for querying the name of an
OpenCL device.

Change-Id: I6557dd51f6b23591d4fd2b6383deb6abf58a0b92

Simplify resource-division efficiency check

The code only needs a boolean result of whether any rank is using a
GPU, not a sum of GPUs used or GPU tasks present, or whatever.

It also doesn't need to be coupled to the implementation of
gmx_hw_opt_t.

One assertion was redundant with an earlier check.

Change-Id: I0adae7a9b6c612fe4b4b8e974b71c59a593a64a4

Split printhardware from detecthardware

These have little in common, so should be separate for better
readability and maintainability. Resolves an existing TODO.

Changed a HOSTNAMELEN to STRLEN, just in case hostnames are long.

Noted a new TODO to use MPI_Get_processor_name in printing the
detected hardware.

No other code changes, just movement and include minimization.

Change-Id: Ifd48695964bd7340e957fb55bcb2f66c89e75d1b

Decouple task assignment from task execution

Code that needs to run on a GPU does not also need to know about the
code and data structures that underpin task assignment. The outcome
of task assignment is the information about which GPU to use, and it
is simple and effective to give just that result to the code that
needs it.

Simplifies t_forcerec.

Added more const correctness for gmx_device_info_t pointers.

Change-Id: I094c19e08be73af998bd287e43d5c2b6e5969a60

Stop using hwinfo_g for reporting code

These are passed an actual handle, and should use it, since
they are not participating in the lifetime behaviours.

Change-Id: Iaa4407622a01d0599034c5aea3628a38ffd5a97a

Removed unused include statements

After recent cleanup, these can be decoupled

Change-Id: I2071e1a4eb67faff591b941167b80e4445ded7e2

Fixed bug in gmx disres indexing arrays.

The call to routines in listed-forces/disre.cpp returns values
in a different manner than previously. This meant that gmx disre
would crash since it expected a value to be non-zero.

Change-Id: Ib7008a9655f7e4a8b56ab1319bab9b2104273721

Clean up matrix inversion

This change is mainly a conversion of the orientation restraints
matrix storage from an array of pointers to a real [5][5] struct.
This requires a small change to the jacobi code which also affects
gmx_traj (in a positive way).

Change-Id: I8a960b659e8da847d1e505844535bd6fbc984814

Fix orientation restraint reference

The resetting of the COM of the molecule with orientation restraints
for fitting to the reference structure was done with the COM of the
reference structure instead of the instantaneous structure. This does
not affect the restraining (unless ensemble averaging is used), only
the printed orientation tensor.

Fixes #2219.

Change-Id: I4984ee7f64780a5c3850feb4bfe4a624afd5cec7

Worked around missing OpenMP implementation in orires.

The orientation restraint code is not aware of OpenMP threads
and uses some global information. By only running it on the
master node results are now independent of number of threads
used.

Fixes #2223

Change-Id: Ie86f4bd4e645fa71a58114950f6a297b5788e022

Make loop variables in orires.cpp local

Also moved and renamed some variable declarations.

Change-Id: I804a1addde81950576ce4a49006ff00ea29db3ab

Add check for nr of orientation restraints

Change-Id: I78be294cd1c73db1cdd296e636dfc2806fa90c75

Introduced Mdrunner high-level class

This class does not yet conform to style, e.g. for naming of member
variables, because the immediate objective is to support refactoring
of setup code to permit evolution of seams that could be used either
from an API caller, or a test driver. Appending a lot of underscores
would cause a lot of useless work rebasing other patches that touch
this code, and would make this patch rather larger. A good future
refactoring for this setup code would be to move most of the member
variables to be set up by modules that implement the hypothetical
ICommandLineOption interface.

Further, keeping the names the same in this patch means that we can
check with -Wshadow that there are no symbols that share the same name
in a nested scope (which is an error I made when developing this
patch).

Ported the command-line filename argument storage to std::array, so
that default copy assignment (etc.) works correctly. The storage for
the C-style memory pointed at by the individual t_filenm data is still
duplicated by the same call to dup_tfn().

Now that hw_opt is a member variable, several of the uses that modify
it need to take its address, and several of the const uses of it can
take a const reference, per style. This will make it easier to see
those places where we modify a data structure that ought to contain
only the things that the user selected.

Noted some TODOs for future improvements

Change-Id: I15c308e54ee34541818854cac029998f9e5520ff

Added unit-test for orientation restraints output.

In order to allow splitting of the gmx energy functionality
into an NMR and an energy part additional tests are needed.
The regression test tests the content of the edr file but
not the extraction to xvg files, which is what this test
contributes.

The test is complicated by intricate differences between single-
and double precision files, which was cleaned up.

Had to add memory cleanups in multiple locations as well, e.g.
for cleaning cmap data structures. Never touch untouched code.
Change-Id: I0a6ee05f38bd198c9a3c37f7f837a28ce77b74e0

Provide STL make_unique template function.

make_unique is not available in the ISO standard library spec
corresponding to C++11, but the implementation from the
standards document is small and benign.

src/gromacs/compat/make_unique.h provides this implementation
in the gmx::compat namespace.

Change-Id: I275c33ad6c821e8a8006531dc650b3be6df0a6e5

Enhance cmake and releng handling of GMX_USE_RDTSCP

It is possible for a physical or virtual CPU to support avx and not
rdtscp, or the other way around, so the previous workaround was always
a bit hacky.

This change refactors the implementation to preserve the existing
behaviour of both explicitly setting GMX_USE_RDTSCP to be ON or OFF,
and the default behaviour of relying on the rdtscp detection, unless
we should make a guess associated with an explicit user choice of SIMD
level.

It adds GMX_USE_RDTSCP=DETECT, which is now useful in Jenkins, where a
hypervisor does expose AVX instructions and not RDTSCP, so the above
fallback is not very effective for (e.g.) matrix configurations where
we specify the SIMD level. Various jobs now use either OFF or DETECT
according to the intent of the job. This will be useful in migrating
verification jobs between hosts without having hidden
incompatibilities.

Switched usage of HAVE_RDTSCP to be always defined, for better
checking by the compiler for any errors made.

Improved status messages when the setting changes, and why, if it
depends on the detection result.

Change-Id: I932c1764d91ec317475ef71400802e7b8d07c887

Inform users about the end of run DLB state

The user will not know whether the run did DLB or not without looking
through the entire log or standard error. Hence, this change adds a note
that tells the user about the state of DLB at the end of the simulation.

The note on the fraction of the step used for load balance calculation
has also been improved.

Change-Id: I7a152a26453c00bb0433b738a69fe08ec9caa760

Fix bugs with orientation restraints

The orientation restraint initialization got moved to before the
initialization of the domain decomposition, which made the check
for domain decomposition fail.
Also fixed orientation restraints not working with the whole system
as fitting group.

Change-Id: If5b4659fa90c5d8e9106d260b530b47c735acb0e

Separated responsibility for consistency checking

Printing reports about GPU usage can and should happen before
checking for consistency. Note change to triggering logic - it should
be printed when GPUs are used, not when compatible GPUs are detected.

Moved the point of call for the notes about user-selected GPU sharing,
and under-use of GPUs to be part of the GPU usage report. This is a
cleaner implementation, makes sense whether or not the intervening
code will issue a warning or error, and removes the temptation to
repeat aspects of the GPU usage report.

Removed verbose catching of exceptions. Only std::bad_alloc can
throw, and it's a lesser evil to use the std:: functionality and catch
at a high level than to uglify the code at the point of call to denote
where exception-safe boundaries exist.

Our error handling on multiple MPI ranks is currently crude, and in
the hope that the MPI runtime will help out, we use a barrier. The
barrier helps to cater to lots of previous code where gmx_fatal()
might be called on any rank in the simulation, so moved it up to the
runner level. Also used the correct communicator (so that PME-only
ranks will also participate in the barrier). Note that
gmx_fatal_collective() exists, but is only suitable for coordinating
the output in the case where all ranks are known to have observed the
same error condition.

Converted some comments to Doxygen.

Change-Id: Ia4cfd69aa6e3b158244ae8da44317adf8257739b

Remove unnecessary GPU checking code

These were introduced when the early OpenCL implementation
had limitations.

This commit resolves the TODO to remove them, now that all
simulation code paths are supported on both GPU code paths.

Change-Id: Ic601d90d8c57e2a471785d196c75bfd5e075cf95

Cleaned up bUseGpu in runner

This variable was rather unclear about what it actually meant, and
various comments were wrong. This had several effects.

It obscured that automated assignment of GPU IDs to PP ranks actually
does take place after thread-MPI thread launch.

Fixing it revealed that we set up nstlist and rlist for the Verlet
scheme unnecessarily early, and this leads to an extra broadcast
(which is not costly, because there are many other broadcast calls
nearby).

Previously, -nb gpu and -nb auto would trigger GPU detection, and the
result of the detection was used to decide whether GPU assignment
might take place. tryUsePhysicalGpu and forceUsePhysicalGPu now play
the latter role, so that it is clear that we have the flexibility to
avoid assigning work to GPUs when that does not make sense (e.g. rerun
with energy groups), even if GPUs are detected. In particular,
tryUsePhysicalGpu needs to default to false when GPU support is not
configured, and when the .tpr is for the group scheme.

Removed strange comment about ignoring manually selected GPUs, which
may have been an historical artefact about doing so on PME-only ranks.

Change-Id: I69f21210a7ee931e5fa2cbe49e0999ad89ba7426

Rework and rename gmx_select_gpu_ids

This change makes clear that this function now produces a validated
mapping of intra-node PP ranks (ie with GPU tasks) to compatible GPUs
on that node. If the user provided GPU IDs, then exit if that mapping
is invalid. Otherwise, produce a valid mapping.

Noted TODOs relating to our lack of feature-complete error-handling
infrastructure.

Change-Id: I63ee0e5fbdce87cdefd458d3024a0de8dca472a5

Stop duplicate printing of detected GPUs

The detected GPUs and their compatibility status are always printed to
both stderr and log file immediately after detection, as part of the
normal hardware report.

When the user made an invalid selection, we should not print the
information about the detected GPUs again (differently).

When an auto-selection is made, we do not need to print information
about the detected GPUs again.

sprint_gpus is now local to a single source file.

Noted TODO to separate a printing functionality from a checking
routine.

Change-Id: I47ef0da6bdf58d9a61b8577e539effd96c771b8c

Fix use of GMX_SIMD in own-fftw build

Recent change separated the variables that contain the user's input
from the resulting action, but this aspect of the behaviour was
missed.

Change-Id: I8881e08ac8deb818f86c0db737beedeea77e4a5b

Removed unnecessary static declarations for command-line parsing

For historical reasons command-line parsing needed static variables
when declaring the t_pargs structures, however modern compilers
do not need this anymore. Simultaneously, repeatedly calling a function
containing static variables may lead to irreproducible results.
The programs fixed here have no unit tests, however grompp and pdb2gmx
are tested in the regression tests, which pass.

Fixed uninitialized variable that turned up when not static anymore.

TODO: fix the same in analysis tools.

Part of #2113.

Change-Id: I1f67cc54362062289264b83078d3317f8e914faf

Made post-submit config to test clang+openmp.

Resolves a testing matrix TODO, adds one for
clang+openmp+cuda

Change-Id: Iab628029d6020491f83a7e03f7c5cb7289f8154c

Fixed test-order dependency in gmxpreprocess tests.

Due to the use of static variable in genconf.cpp and
solvate.cpp, used for initializing command-line args
the result of the tests were dependent on the order of
execution. The reason is that static variables set in one
of the test runs are "inherited" by the next run. A
particular example is a randum number seed, which when 0
is set to a random seed, but just the first time around. This
is obviously not a good idea when aiming for reproducability.

Part of #2113.

Change-Id: I8bd8d36c75dcd20824e7f3db7268512387cd04ca

Fix non-reproducability of clustsize tests.

Due to the use of static variables in the old command-line
parsing codes, running the same test repeatedly would "inherit"
settings from the previous test. This is fixed here by
removing the static flags and updating some test results.

Part of #2113

Change-Id: Ie5bdf4aaefb8acfdb07a4ee65652a1c3c8ca84e6

Improve CMake implementation for GMX_SIMD

It is robust to have cache variables that are user choices (and to
refrain from modifying them), or the result of system
introspection. We should not cache the result of subsequent
fast-running logic.

Introduced the new default for GMX_SIMD, which is AUTO. The value AUTO
triggers the same detection path that used to set the old "default",
producing the same SIMD choice, but leaves the value of GMX_SIMD as
what the user chose, ie "AUTO". That means e.g. ccmake will now show
the user their choice, not the result of the detection. The choice
is written to the status output each time it changes (even to and
from AUTO).

Explicit non-AUTO choices for GMX_SIMD work the same way they used to.

Simplified the implementation surrounding the run of the detection
code. We can just store the feature string if the run was successful,
and not otherwise, and that works also to prevent unecessary re-runs
of the detection.

This will now make it possible for us to implement other logic that
reacts to a SIMD choice differently according to whether it comes from
the user or detection.

The implementation of GMX_USE_RDTSCP now works better, because RDTSCP
is now on if the user chose an AVX-era SIMD level, off if an older
SIMD level was chosen, and suits the build host if the SIMD choice is
also automated to reflect the build host.

Moved a compiler-problem check to the SIMD-management code so that
it can use GMX_SIMD_CHOICE after it is defined.

The FFT-management code now uses GMX_SIMD_ACTIVE, which works because
it runs after the SIMD management, which uses a macro (so shares the
same variable scope as the calling code).

mdrun binary information still reports the actual choice, and not
whether it was based on AUTO.

Verifying that the rest of the code works correctly is complicated by
the way the SIMD module defines GMX_SIMD, and that is used e.g. by the
nbnxn, lincs and bonded code.

Minor fix to docs of gmx_check_if_changed.

Change-Id: I54e140394ea0c452450ebebad5782849e92a5ce2

Prune unnecessary cells in analysis nbsearch

When doing a grid-based neighborhood search, do not compare pairs in
grid cells that are completely outside the cutoff. This excludes up to
40% of the search volume (if the grid cells are much smaller than the
cutoff), but more realistically the speedup is of the order of 10-20%.

Change-Id: Ie8965b001cc20f739432ecdce9011b686c76236a

Converted some const pointers into const references, per style

There's more we could do here (later), but this caught my eye while
cleaning other things up.

Change-Id: I55509985172ce7c53cc4f3a9afbb903fd6adfc36

Eliminated pick_nbnxn_resources

The data about which GPUs on a node will be used by the PP tasks on it
is stored when the dev_use array is filled, either by user selection
or the automated per-rank round-robin assignment in
gmx_select_rank_gpu_ids. It is filled so that the PP intra-node rank
IDs will index into it, so aspects of the task assignment can be seen
as already complete. The code in pick_nbnxn_resources makes the final
action of indexing with the intra-node rank ID, then handles the
initialization of the selected GPUs.

Note that nothing meaningful has happened since the the call to
gmx_select_rank_gpu_ids. mdrun has done some checks, called
init_forcerec (which then set up a lot of unrelated things), and then
called init_nb_verlet. The actual task assignment (selecting the GPU
for this task) is where it always was (ie implicitly somewhere between
filling the dev_use array, and the act of actually indexing into it),
and future design options are still as open as they ever were.

Change-Id: Iac908bb74dbec20edaecf6fbfcd18b87179e7aba

Fix hwloc detection

In some cases, including the release workflow build in Jenkins, this
could fail because it uses a CMake macro without including the file
where it got defined.

Change-Id: I83646e0b489c8930a5b6d7fdb2584cb760282fdb

Implemented grompp -po for options-style mdp handling

Previously options that were handled via a key-value tree (currently
only electric field options) were not written to the .mdp output.
Restoring this functionality will permit more widespread use of the
key-value tree handling during the transition period.

Restored mdp file comments for electric-field options, but the method
for writing them is hacky. This is probably good enough for
functionality that will disappear when we shift to key-value input.

Change-Id: I488b37ff72f70a6e145338fd9d57daa1c9e4de7e

Clean up cmake build host detection

This resolves some existing TODOs to de-duplicate code,
while making it more robust to re-execution and proper
use of the cache.

Change-Id: Ie9c0618778b6da2aace37baa182785604c8f4afe

Clean up gmx_mdrun()

NFILE is inconsistent with usage in the rest of mdrun, changed to
nfile.

Removed outdated comment.

Changed several variable names to *_choices, because in two cases the
actual selection gets passed to runner() as a pointer to the updated
first element of the choices (which is organized by
parse_common_args()). It's easier to understand that the variables
have different roles in the two places if they have different names.

Renamed nstlist and nsteps to the same with the _cmdline suffix that
is used in mdrunner().

Change-Id: Id99fa26b7ab38cd67de8f259c612179ad7f1e992

Cleaned up mdrunner_start_threads

Threw an exception rather than return a nullptr to trigger a
fatal error.

Removed unnecessary variables. Parameters passed by value are copied,
so there is no need to make that explicit just because the return
value will be stored in the same variable that was just the source of
a parameter.

Noted TODOs for removing some allocation and leaks if we progress to
the point where mdrun filename arguments can be proved to be used only
in a const-safe read-only manner.

Updated and converted comments to Doxygen

Change-Id: I638c757d2d0aa5175dfe3769b499993eb613ea26

More updates to testing matrix

Covered newest cmake, while ensuring that we test the new
and old FindCUDA modules.

Added new configuration to target gcc-7.

Moved x11 responsibility to a non-GPU configuration.

Moved a simd responsbility off the tsan configuration, which has
a TODO to get moved off the GPU build slave. Updated it to gcc-7
which should be useful.

Consolidated three jobs that built on bs_nix-amd into two.

Fixed minor issues with possible sprintf over-run or snprintf
truncation, and possible use of uninitialized pointers.

Fixed release matrix use of gcc-[567] versions. The release-2016
branch usess older compiler versions.

Change-Id: Iafd6a440fd763ef6c645bb094579e0ec6875621b

PME-gather: make variables local and const

Change-Id: Ic03622cf06418ffe2b667cb7ad96ccbfb44695ee

Support scoped key-value tree transform rules

Resolve several TODOs about having the knowledge of the "applied-forces"
option section scattered in different places. It is now possible to
create a "scoped" key-value tree transform rule interface such that all
rules created through it target a specific subtree in the target tree.

This makes it possible to have initMdpOptions() and initMdpTransform()
operate on the same tree in electricfield.cpp, and have the higher-level
structure only declared in mdmodules.cpp. Remove the dependency on
MDModules from electric field tests, since it can now use these two
methods without knowing anything about the presence of the higher-level
"applied-forces" section.

Change-Id: I8107b10af3c9c602f40297a2279e3b0449e27e8d

Added missing .mdp file documentation for the enforced rotation module

Change-Id: I08398b0a53ef6154a6ac8005890e21f529659dd0

Restructure the load balancing timing

The load balancing region is now set by a (variable) start and end
point. This is much simpler than the tedious addition of timings of
many small regions during the force communication over multiple files.
The disadvantage of the current implementation is that one needs to
place a call to ddReopenBalanceRegionCpu() after every communication
call that can occur in the balancing region.
This change should avoid instabilities in DLB due to e.g. more time
being measured, but also available, when using GPUs and ranks with
less load starting earlier due to the constrains finishing earlier.

Change-Id: Idf73c3367adc269def533dfabf27df2ba4f6834f

Refactor error handling in init_gpu

Change-Id: I7fe1d196ed3696d359443553407862b08c089c4c

Simplify gmx_gpu_opt_t

This temporary data about compatibility can be recomputed simply at
need.

Change-Id: I70352d75a39f4f777823638c462ea108829e46be

Detect unknown mdp options in gmx dump/check

Make gmx dump and gmx check fail gracefully if an input tpr file has
options that are not known to the current code, instead of silently
ignoring them. gmx dump -orgir can still write out such files.

mdrun is using a different path (at least for now) that failed also
previously.

Change-Id: Ib61b03bb44c1510b03543f6e558caca5d44cb84d

Unify nbnxn kernel dispatchers

Reduced code duplication by merging the three nbnxn kernel dispatch
functions for C reference, simd_4xn and simd_2xnn into a single
dispatcher. This also removes the implementation details from
sim_util.cpp.

More reorganization could be done, but is not included here to
minimize the size of this change.

The original goal of this change was to conditionally compile all
files in the simd_4xn and simd_2xnn directory, but currently we
do not have the conditions available at configure time.

Change-Id: I41f7ae96d267a12cf6024866b59bb0cda7e1cd2f

Clean up pick_nbnxn_resources

GMX_NO_NONBONDED should not trigger GPU emulation, it should disable
nonbonded calculations. Removed that logic, and amended the
documentation.

Made emulateGpu a field of nonbonded_verlet_t, set by the environment
variable, and used it to trigger the subsequent paths.

This simplifies pick_nbnxn_resources.

Change-Id: I5ce4f69e470fe7e24bb554556211697d25f11b0f

Remove gmx_constexpr

gmx_constexpr was introduced because MSVC 2013 does not support
constexpr, but we no longer support version 2013. It was noted
that version 2015 also has issues, but our Jenkins MSVC 2015 setup
only produces a warning for a single line, which is now fixed inline.

Change-Id: I4f9321b5845ad89f3c13af3a0c9fa3c9f00b59d4

Cleaned up GPU emulation logic

It was unclear how GPU emulation was intended to work. I assume that
it should trigger construction of a GPU pairlist and the use of the
matching plain C kernel, but otherwise function like mdrun -nb cpu.

A new fatal error for an inappropriate combination is introduced.

gmx_select_rank_gpu_ids() now cannot be run if emulation is active,
because it does not make sense to run it.

Change-Id: I1ff687afa26a9daee437f72441c856b4963d2272

Remove outdated cmake nvcc logic

Change-Id: Id51f680a9237bc6fcbb7a9b7441a5fc158be1d47

Mark unmodified input parameter as const.

Change-Id: I9d84ae7d4635a5132b3d1c9ff2d17ed5b344ff0d

Fix boolean serialization

Serialize the value instead of address of the value...

Also, write out more information for key-value tree
serialization/deserialization tests in case they fail, which would have
helped seeing the problem in Jenkins.

Change-Id: I427b80f38522e1b9792a8c49aa46ed2e85ff2ff7

Extend key-value tree comparison and tests

Cover comparison support for key-value trees for all basic option types
with tests, and implement support for missing types.

With this, all the basic option types should now be supported through
the whole chain of t_inputrec handling.

Change-Id: Id8f821e2bb7485eecea0922872d86bddcc69ef6f

Mark unmodified input parameter const

Change-Id: I01a6b180d8d4684e7888a9c5d753e6012df83779

Generalize IForceProvider

- Remove knowledge of individual modules from t_forcerec, and hide the
  number of modules, and whether they contribute to a virial or not,
  behind a generic interface.
- Improve hard-to-understand initialization of separate f_novirsum,
  resolving some TODOs.
- Add some parameters that probably make sense for other modules beyond
  the electric field one.

This also makes the requirement for all modules to have the same
parameters for the calculateForces() call explicit.

Change-Id: I4952515c4b707ba458fd267565fd000532ec281e

Extend key-value tree serialization and tests

Cover serialization support for key-value tree with basic option types
with tests, and implement missing support for bool values.

Change-Id: I630387393fce1435f64f9115870552067182145b

Cleaned up high-level boolean variable naming

Variables like bSIMD and bGPU were rather unclear about what
they actually meant.

Modernized to remove b prefixes and to use C++ bool type. Clarified
that some variables related to the use of a physical, rather than
emulated, GPU.

No changes to logic, just renaming.

Change-Id: I2bce7e1d554d3910fbbe685bae0d6a32bf50ac91

Merge branch release-2016

Change-Id: Idb888bea9e78a9979468ff0c2bc2c02443e02026

Merge "Merge branch release-2016"

Fix ICC workaround for C++11 feature check

- Fix that move constructor check was disabled for non-ICC
- Disable check for all supported ICC versions

Change-Id: I295ffb9190d6c02540e709fafd9330a581afd1de

Do not use conf-man.py for non-manpage doc builds

Avoid importing conf-man.py for Sphinx invocations that do not build man
pages. Since it only specifies man page build rules, it is not
necessary, and some of the targets do not have dependencies that would
ensure that it is present.

Fixes #2184.

Change-Id: I3fe9cb03667c9a4e1d4bb9e02b65544e8250be31

Merge branch release-2016

Conflicts:
admin/builds/pre-submit-matrix.txt

Checked that intent of test configs is implemented appropriately in
new context. The cuda-8 config newly introduced in release-2016
is already covered in this branch, so we don't need to have that
new configuration here.

src/gromacs/gpu_utils/gpu_utils.cu
src/gromacs/simd/impl_ibm_qpx/impl_ibm_qpx_simd4_double.h
src/gromacs/domdec/domdec.cpp

Trivial

Change-Id: I9b06bb476f5b62c9652c0e5186340ed11b0c31cc

Minor improvements to test configs

No functional changes - this just improves the way we specify and
document intent.

Unless there's interaction effects being tested, only one
configuration should document that it is intended to test e.g. "thread
MPI with CUDA".

Made that specific for one such config, and removed the documentation
from a second config, even though it incidentally tests such code
(because it is currently the default).

Added a config that specifically tests thread-MPI (which it was
coincidentally already doing).

Clarified that the test config that covers the SIMD implementation of
search for pair lists to use on the GPU - which SIMD implementation of
the search is tested is merely coincidental, rather than a specific
choice.

Change-Id: Id1889753623f7808cd1bb2fb060ee2b2852dfd94

Fix ARM NEON simd debug builds

Fixes compile-time logical errors. This cannot have
caused any incorrect runs since the code would not
even compile before with a debug build.

Fixes #2209.

Change-Id: Iabdf3ba113e0ddb329ae917f511955cf4c65ed4f

Add ARM_NEON_ASIMD post-submit test

Will run on the Jetson TX1 dev board.

Change-Id: Ia7907d2c11cdd46dd6680f984a6d52e1913c0453

Updated application clock handling on Pascal+ GPUs

Starting with Pascal (CC >= 6.0) it is no longer possible to change
application clocks without root privileges. With this patch application
clocks are only reported for Pascal+. To avoid meaningless warnings in
case applicaitons clocks have already been set externally the function
init_gpu_application_clocks is now exited early if no application clock
changes are necessary.

Change-Id: I1d99ebff1fa32ba1fd44a37dcb43158da733daed

Fixed #2206 IMD interface malfunctions

Change-Id: Ia58586a281591cefea8a382a40e92e3e30b56b75

Consolidated logic for choosing number of thread-MPI ranks

Noted TODO to handle issue that was always present, but is now
easy to see needs handling.

Added explicit fatal error for a case that a comment claimed was
handled later, but for which I could not find handling.

Removed some logic for nthreads_tot == 1 that was redundant with
that in get_nthreads_mpi and check_and_update_hw_opt_3.

Added explicit checks for -ntmpi and -ntomp greater than -nt,
replacing an old check that gave an incorrect message, and was only
used when -nt selected 1 total threads.

Change-Id: I6b6634ac4dd726a784a626624de4405a4ddb07f0

Replace misnamed nbnxn_gpu_acceleration_supported

Historically, there were paths that could not run on GPUs, but the
only remaining condition here relates to a path that merely is not
useful to run on a GPU. Thus, I renamed the function, and relocated it
to the single point of use, preparing for future clean up there.

Removed a TODO that is no longer applicable.

Change-Id: I7ace37af14b01b8f6bc7944073951f79aecb8e65

Consolidate and fix logic for mdrun -nb and -gpuid

Several aspects of task assignment did not work as well as it should.

If gmx mdrun -gpu_id 01 is intended specify that work run on those
GPUs, then e.g. if the tpr uses the Group scheme, then mdrun should
refuse to run, just like it does for mdrun -nb gpu. Now it does
refuse.

gmx mdrun -nb cpu -gpu_id 01 should always give a fatal error, and now
does.

CUDA_VISIBLE_DEVICES="" gmx mdrun -nb gpu and the same with -gpu_id
should give a fatal error, and now does.

After this change, if the user has required short-ranged work on a GPU
with -nb gpu, or made an explicit GPU task assignment with -gpu_id
without using -nb cpu, exit quickly unless GPU support is compiled,
the Verlet scheme is active, and GPUs were found.

Introduced a helper function for whether compatible GPUs have been
found, to help improve encapsulation and readability.

Removed hack from mdrun integration tests that coped with early
implementations of -gpu_id, which is no longer needed.

Fixes #2067

Change-Id: Ic5091edc892b0fcb0371720a5000b80019b5b3d2

Re-enable lost DD warning + minor code modernization

After the message got refactored in 147de64 the dd_warn() call was left
out and therefore a DD warning message would get assembled but never
printed.

Change-Id: I860f52cf7b15e18b8a3533237dcc217b3b844b7e

Reform ngpu variable

This had different interpretations before and after MPI, and you had
to read the MPI call to work out the meaning. Made different
variables that directly reflect the two meanings.

Change-Id: I55b6f0b7d05c3286244231af7a5ef3e031079859

Remove bUserSetGpuIds field

This is only used for a short time during setup, so might be better as
a temporary variable, rather than a field of a struct that persists
for the whole of mdrun.

Moved a single-use const string for an error message from
gmx_parse_gpu_ids to its place of use, and fixed that its content had
fallen out of date because it was in that other place.

Change-Id: I1d1a771a2de423a65a22714ecf86795a0dcfe2a9

Re-enable accidentally lost DD warning

After the message got refactored in 147de64 the dd_warn() call was left
out and therefore a DD warning message would get assembled but never
printed.

NOTE: skip when merging into master, the fix has been added together
with modernization (Change-Id I860f52cf)

Change-Id: Ieaeae3bae69e029b671bf18a3cce6b1d2aebcea9

Refactoring + clarification of DLB state handling

The change does a few refactoring and renaming steps in order to clarify
DLB internal state handling code:
- separates the two DLB "off" states depending on whether the user
turned it off or mdrun switched it off due to some internal condition;
- clarifies the naming of user-requested "on"/"off" states;
- disallow override of user-requested DLB "on"/"off" only allowing
defaulting to disabling when "auto" is passed on the command line.

Change-Id: I5d7ed07deeded3f135884fc22fe558eabfe68533

Add tolerance to calc_verletbuf test

Change-Id: I4cf79e7ca4411fb326d90916f8afcb218f01b176

Fixed a consistency check in make_edi for flooding

If one sets up a flooding .edi input file with gmx make_edi,
the code should check that one does not use of the last 6 eigenvectors
of the covariance matrix, which correspond to the rotational and
translational degrees of freedom.
The check that was in the code erroneously checked against the
number of eigenvalues neig that was stored in the .xvg file,
not against the total number of eigenvectors which depends on
the number of atoms nav used in gmx covar. Thus the original
check would always fail if the .xvg eigenvalue file contained
1-6 values only.

Change-Id: Ib293f6b69b80fbf014a7507431beaf1f939849ac

Allow disabling the explicit use of CUDA textures

This change implements fallback for the explicit CUDA texture loads
in the non-bonded kernels. This can be done by defining
DISABLE_CUDA_TEXTURES. When disabled texture objects/references
are not initialized either.

Also removed unnecessary extern declarations of texture references
in nbnxn_cuda_kernel_utils.cuh; this was only needed because texture
reference accesses were previously compiled unconditionally (and were
also generated in the nvcc host pass).

Change-Id: Id7cdd6f80da0abe6be5639e80bed6530c3ce25c0

Simplify reporting from init_gpu

makeGpuUsageReport reports whether the usage was selected by the user
or not, and has been run before we call init_gpu.

In any case, the reporting from init_gpu should only happen if mygpu
contains an invalid value, which should never happen so late in the
code, ie after all the previous consistency checks. Knowing whether
the user selected the GPUs or not doesn't help much, and that
information is available from makeGpuUsageReport, anyway.

Change-Id: I1b701dceb4f4c5ee62175827b6a70bb134b0f66a

More quotes

Change-Id: Ia60e6cac8acbddad4857af8a56dee173002dcba9

Fix null pointer print in DD

Fixed a (rather harmless) print of a null pointer string during
DD initialization. This would only show up with mdrun -dlb yes.

Change-Id: I2ea16ce11a969d6102c6b6cac0d3853aa9dd4c86

Move gmx_parse_gpu_id out of checking function

Parsing user input should happen separately from running checks for
consistency or sanity. Some of the content of gmx_parse_gpu_id is more
like consistency checking, so left that where it was, using the output
of the parsing to run the checks. Doing so means that the first such
check now works when GMX_GPU_ID is used, when previously it did not.

Made minor improvement to error message text for that check.

Noted TODO to remove helper functionality for GPU sharing checks,
since it is now supported everywhere.

Change-Id: I2ea5841f0fdf461f3024b442a2fe641ea1435f49

Store a physical node communicator in t_commrec

Introduced a physical node communicator and a corresponding barrier
function, which is called within thread-MPI builds before
the GPU context deallocation, instead of a task-group (PP/PME) barrier.

Change-Id: I394db9d223f32dc2e093c0757889ba48485a8a88

Updated TNG to version 1.8

Corresponds to commit 9d76cd757cfe13eefc554407683b462c4f414390
in the TNG repository.

Added data block for atom masses.
Fixed some bugs and warnings.

Change-Id: I45cfd2f183b338fb1ac43719d62a5c63c3e35f94

Put replica exchange parameters in a struct

Change-Id: I56d3b19c3e3c83bf99ac622930fb20c1efdc129d

Moved essential dynamics initialization call into do_md()

- init_edsam is now called in do_md() in the same code block
where the membed and the swap code gets initialized
- the ed struct is removed from the integrator calls
- removed a TODO, which was actually already done (closing of edo file)
- Fixes one of the TODOs left from gerrit.gromacs.org/#/c/5660/

Change-Id: I396cf943fe34ef1683204effd4d5e935e6132dd7

Replace iso by instead of

Replaced "iso" by "instead of" in all user-facing text.

Fixes #2202.

Change-Id: I126b5d06bb933d68c22faae3b8f39de38b704b78

Adopt and document implementation policy for MD modules

This approach concisely expresses intent in a way that a tool/compiler
could enforce, and which provides a hint to discourage development
from using module classes as bases of inheritance. This conforms to
CppCoreGuidelines C.128 and is now documented in the high-level
developer docs.

Change-Id: I8312af73d1ffd8c0a4234fc30488d016dd969a9f

Fix COM pull force with SD

The reported COM pull force when using the SD integrator was random
only. Now the pull force is summed over the systematic and random
SD update components.
A better solution is to not add the random force at all, but such
a change should not be done in a release branch.

Fixes #2201.

Change-Id: I10a56b30b952869396d170914bcbc0163299a1c8