BioD PNPI Git Repos - alexxy/gromacs.git/log

biod.pnpi.spb.ru / alexxy / gromacs.git / log

Aleksei Iupinov [Fri, 22 Sep 2017 13:35:28 +0000 (15:35 +0200)]

Template/move CUDA texture cleanup code from NB CUDA module to cudautils.cu

Noted TODO: easy transformation into a GPU table class.

Change-Id: I20d684221fa8304d01ab7fd4a19f2c4469110142

commit | commitdiff | tree

Szilárd Páll [Wed, 25 Jan 2017 01:53:17 +0000 (02:53 +0100)]

Enable compiling CUDA device code with clang

clang can be used as a device compiler by setting GMX_CLANG_CUDA=ON. A
CUDA toolkit (>=7.0) is also needed. Workarounds required:
- texture operations are not supported, use the LDG/direct load-based
  fallback in such cases;
- CMake does not support natively clang for CUDA, but it's easy to
  convince it by setting CXX as compiler and few extra flags for *.cu.

Note that clang support is experimental and it is aimed at improving
portability and to allow using clang sanitizers without hassle in
CUDA builds.

TODO/investigate:
- CMake seems to not track some files properly with clang, changes
  to nbnxn_cuda_kernel{,_fermi}.cuh do not trigger a recompile (likely
  due to the indirect include through a macro in nbnxn_cuda_kernels.cuh).
- Full rebuild is triggered even if only CUDA compile flags are changed.

Change-Id: I3543469d9f0fda37c186ba8bb474980018bd5c54

commit | commitdiff | tree

Berk Hess [Thu, 28 Sep 2017 21:01:14 +0000 (23:01 +0200)]

Move calling of special force functions

All special force provider algorithm functions are now collected
in computeSpecialForces(). This function is now called before
the wait for non-bondeds on the GPU. This allows for more overlap.
The only cost is potentially one extra force buffer reduction at
steps where the virial is needed, but with PME we already do that.
To achieve this, f->f_novirsum is now always set, as the
documentation already (incorrectly) stated.

This change also simplifies both do_force routines and clarifies
where special algorithms can be called, which could now actually
be anywhere in do_force().

Change-Id: I0711a379ed3c31838ede9e55c4cd5d0a95e967fd

commit | commitdiff | tree

Aleksei Iupinov [Wed, 20 Sep 2017 13:45:13 +0000 (15:45 +0200)]

Add WinAPI page size query for allocator purposes

Change-Id: I4c1e534a4b950bc38d45c40ad5598cabb2045ede

commit | commitdiff | tree

Magnus Lundborg [Tue, 13 Jun 2017 15:04:10 +0000 (17:04 +0200)]

Write atom masses and partial charges to TNG

B states of molecules are not yet written. Data blocks for that
must be added to TNG first.
'gmx dump' will print atom masses and partial charges along
with the molecule, if the data is available. No other utilities
use the data yet.

Refs #2188.

Change-Id: I7dd80d7b6281b2c3710fb541fa3cee6fbdcb2256

commit | commitdiff | tree

Mark Abraham [Mon, 3 Jul 2017 17:40:10 +0000 (19:40 +0200)]

Introduce and use AlignedAllocationPolicy

This permits the allocators to be specialised with a policy class that
can vary (and report on) the alignment.

Change-Id: I875548ac325edcf07074ad35f9d90cdf561ea750

commit | commitdiff | tree

Berk Hess [Wed, 9 Aug 2017 12:06:01 +0000 (14:06 +0200)]

Remove hybrid gpu+cpu nonbonded mode

This mode was not very useful, since it ran the non-local non-bonded
interactions on the CPU. The fraction of non-local interaction is set
by the domain decomposition, so this is not flexible.
Also this mode is not being tested.

Amended docs to remove ambiguous reference to hybrid mode

Change-Id: I8cdd31f228e2c0104527f66265fd06a5e971b9da

commit | commitdiff | tree

Szilárd Páll [Thu, 28 Sep 2017 14:22:17 +0000 (16:22 +0200)]

Correct misplaced CUDA timing event record

The event record happens after a cudaStreamWaitEvent is placed in the
non-local stream and therefore, in that stream it would include the wait
time in the measurement. However, as with DD / two streams timing can
not be performed due to the limitations of CUDA events, in practice this
was never an issue.

Change-Id: I2ca89c7acd461e480a324d40911dd4c6f5aac478

commit | commitdiff | tree

tomaskubar [Mon, 18 Sep 2017 16:57:19 +0000 (18:57 +0200)]

QM/MM: remove optimization and trans. state search

These functionalities used to only work with old versions of Orca,
had very limited use and will possibly not work any longer now.
The possibility of optimization with QM/MM using the Gromacs drivers
(steep, cg, lbfgs) has been already tested,
and it will be added in a follow-up commit.
Transition state search functionality will not be available however.

This is a beginning of an attempt to clean up the QM/MM code.
A version that works with Verlet nblists is already prepared,
and will follow also.

Change-Id: Ice166dad0d4080821f1a20671fe698d549b3ce99

commit | commitdiff | tree

Aleksei Iupinov [Fri, 22 Sep 2017 12:41:02 +0000 (14:41 +0200)]

Move CUDA texture setup code from NB CUDA module to cudautils.cu

Change-Id: I7e47a65866c29be06ce522572e90a17c775157ab

commit | commitdiff | tree

Szilárd Páll [Thu, 21 Sep 2017 14:13:43 +0000 (16:13 +0200)]

Rename GPU launch/wait cycle counters

In preparation for the PME GPU task and GPU launch overhead to be
counted together in the same counter for all GPU tasks, the current main
counters have been renamed to be more general. The label of GPU waits in
the performance table have also been renamed to reflect the task name.
Additionally a non-bonded specific sub-counter is been added.

Change-Id: I65a15b0090c1ccebb300cf425c7b3be4100e17a0

commit | commitdiff | tree

Berk Hess [Sat, 23 Sep 2017 09:28:01 +0000 (11:28 +0200)]

Put rerun state preparation in a function

This is only code motion and some variable renaming.
Note that the vsite construction now constructs in the global state
pointer, but the old code asserted that this was identical to
the local state pointer.

Change-Id: I81d0bad0650eb781b68816e917814b2f97e2d529

commit | commitdiff | tree

Berk Hess [Wed, 6 Sep 2017 13:48:04 +0000 (15:48 +0200)]

Set global state on master rank only

The global state class was initialized on the master rank
and broadcasted over all ranks. This lead to unnecessary communication
and it was difficult to know what part of the global state was up to
date at different points in the code.
Now only the master rank has a global state object.
This change requires conditional access to the global state pointer
for the few algorithms that use it.
Replaced bcast_state() by a function that only broadcasts x and box.
Also made the box pointer constant in domdec initialization.

Change-Id: I924487863abe096eeb0b3cbc944b4ba32898ef03

commit | commitdiff | tree

Aleksei Iupinov [Thu, 21 Sep 2017 16:31:48 +0000 (18:31 +0200)]

Template the CUDA texture setup code on raw value type T

Change-Id: I252e1d68d263f4aca15f00863e9ed67213fdb22f

commit | commitdiff | tree

Aleksei Iupinov [Thu, 21 Sep 2017 15:51:49 +0000 (17:51 +0200)]

Eliminate unused Coulomb correction table size coulomb_tab_size

Change-Id: I35df7fc0590e12de8ad44791bce26cac1193065a

commit | commitdiff | tree

Aleksei Iupinov [Thu, 21 Sep 2017 15:43:17 +0000 (17:43 +0200)]

Eliminate duplicate CUDA texture setup code

init_ewald_coulomb_force_table() now uses initParamLookupTable(),
which is moved higher in the file.

Change-Id: I2cb799f1b4b78c650282be37b125b3da88e98f6c

commit | commitdiff | tree

Berk Hess [Tue, 12 Sep 2017 11:21:50 +0000 (13:21 +0200)]

Change nbnxn indexing to templates

Replaced the index lookup functions for SIMD pair list generation
with templates.
The last remaining ci_to_cj function and call will be removed in
a follow-up change.

Change-Id: I3222e616ac60f846d7c1e85685f915fba014290f

commit | commitdiff | tree

Roland Schulz [Fri, 15 Sep 2017 21:47:37 +0000 (14:47 -0700)]

ICC: Disable include path warning

Under certain circumstances MKL and PSTL can cause spurious
warnings. Unlikely for warning to be important.

Change Icdb24d0fed4060 previously removed this flag for C++.

Change-Id: I828c73c7139532dca6ee2d1279b0d1e109b0cf5c

commit | commitdiff | tree

Mark Abraham [Fri, 15 Sep 2017 08:39:52 +0000 (10:39 +0200)]

Update testing matrices to test with (latest) icc-18

Change-Id: Ia1da377b40fcaba4f364b77437dcddea702968a2

commit | commitdiff | tree

Aleksei Iupinov [Mon, 18 Sep 2017 13:37:44 +0000 (15:37 +0200)]

Skip GPU reallocation of an empty non-local pairlist

This makes the corresponding timing event's usage consistent.
The common early return condition is moved into a new header
nbnbn_gpu_common.h, shared by OpenCL/CUDA.

Change-Id: I5b9cc027676cccda69f5b844d07e94e5fda620e7

commit | commitdiff | tree

Szilárd Páll [Wed, 13 Sep 2017 18:37:39 +0000 (20:37 +0200)]

Minor refactoring of CUDA non-bonded kernels

Refactored the dynamic shared memory pointer assignment for improved
robustness. Also reverted some unnecessary leftover changes introduced
with the Volta-enabling and dynamic pruning that resulted in differences
between the legacy and normal CUDA kernels with the goal of improving
maintainability.

Change-Id: I4ce58f3e1f963935d7e3832ed0e537c94d81a632

commit | commitdiff | tree

Mark Abraham [Sun, 17 Sep 2017 21:43:42 +0000 (23:43 +0200)]

Merge branch release-2016

Kept a cmake variable description string from release-2016

Change-Id: I2077780afbbe3cc610d699ea498caa6959e4000c

commit | commitdiff | tree

Berk Hess [Wed, 14 Sep 2016 10:42:31 +0000 (12:42 +0200)]

Remove nb-parameters from t_forcerec

Removed all Coulomb and Van der Waals mdp parameters from t_forcerec.
The ones that are not used in the Verlet scheme (yet) have been added
to interaction_const_t. Now init_interaction_const() is called early
in init_forcerec().

Change-Id: I9ca14f2194742cc8b0aef09a42dbd75fa3f94517

commit | commitdiff | tree

Mark Abraham [Mon, 11 Sep 2017 16:55:00 +0000 (18:55 +0200)]

Express intent of matrix release build types explicitly

We intend to do primary testing in Jenkins with all assertions
enabled, so that we maxmize our opportunities for useful feedback. We
do so in both Debug and Release flavours, so that we have confidence
that those different builds will remain maximally useful.

Expressing that we are building the Release+assert type explicitly
will help developers understand when and why they get feedback from
Jenkins.

This also permits us to build without assertions in a convenient
way. This is useful for e.g. checking that such builds remain free of
warnings, which is a useful corner case to have changed, since a
variable or parameter might only be used in an assertion expression.
This now happens in one post-submit and all release matrix
configurations.

Updated the release-matrix contents and documentation to be more
consistent with that of the other matrices.

Noted TODO for adding icc 17 testing.

Change-Id: I774602e361163337e9602f497e2302148ccbf544

commit | commitdiff | tree

Mark Abraham [Wed, 6 Sep 2017 08:38:55 +0000 (10:38 +0200)]

Bumped patch version for next release

Change-Id: I2d5758148734f746f9f03afd1ce3a92a02ab3355

commit | commitdiff | tree

Mark Abraham [Wed, 6 Sep 2017 08:36:54 +0000 (10:36 +0200)]

Version 2016.4

Change-Id: I1b8141d667629f77025b186364090e861fa3a428

commit | commitdiff | tree

Aleksei Iupinov [Thu, 14 Sep 2017 13:12:31 +0000 (15:12 +0200)]

Fix non-SIMD build failing

gmx_detect_simd() now returns an uppercase string,
making GMX_SIMD_ACTIVE uniformly uppercase.

Refs #2246

Change-Id: Idb1c2373dc8955ef8bdb36eb0361505238a117ff

commit | commitdiff | tree

Berk Hess [Thu, 14 Sep 2017 14:34:35 +0000 (16:34 +0200)]

Fix nbnxn SIMD 2xNN PME tabulated kernel

This kernel produced completety incorrect forces and slightly
incorrect energies.
This kernel was not (yet) selected by default in a 2016 release.

Fixes #2247

Change-Id: I297ad257932eaabe6cf84b17f34ad555921f48b0

commit | commitdiff | tree

Szilárd Páll [Tue, 29 Aug 2017 16:39:03 +0000 (18:39 +0200)]

CUDA 9/Volta support for the pruning kernel

Added syncwarp to prevent WAR hazard and changes any to any_sync.

Change-Id: I3d370e7c272dd0c2f0eaf8ee97f8797dd74a405d

commit | commitdiff | tree

Mark Abraham [Fri, 8 Sep 2017 16:21:12 +0000 (18:21 +0200)]

Form taskassignment module

Move existing code into new module, in preparation for an increase
in complexity of that new module.

Removed hyphenation of filename to move to more consistent style
overall.

Change-Id: I04e76048483b287485ac6872ed48ccba52df1e45

commit | commitdiff | tree

Mark Abraham [Mon, 11 Sep 2017 15:59:24 +0000 (17:59 +0200)]

Fix compat module definition

There needs to be a defgroup module_compat and whatever name it has
should match the name of the subdirectory. This needs to work
correctly before code in other modules can include a header from
compat module, else check-source will complain that a compat header
isn't documented as being used outside its module.

Change-Id: Ie6e17d92ecf7ede51e0e32af27bef732a52491f1

commit | commitdiff | tree

Erik Lindahl [Fri, 8 Sep 2017 19:03:58 +0000 (21:03 +0200)]

Improve accuracy of SIMD exp for small args

Introduce a separate check for small arguments and
set the result to zero below this cutoff. This change
also moves the exponential tests to ldexp(), which
as a side-effect also makes that function safer by
default, with an optional template parameter to avoid
the checks.

Fixes #2243.

Change-Id: I41bccaeec9921c3aead2cd2caf41cbe2206e0687

commit | commitdiff | tree

Mark Abraham [Tue, 12 Sep 2017 10:01:48 +0000 (12:01 +0200)]

Merge branch release-2016

No conflicts

Change-Id: Iababf3d2439f1b465ebdf8bcb3d29815cb7830e1

commit | commitdiff | tree

Aleksei Iupinov [Wed, 6 Sep 2017 15:18:44 +0000 (17:18 +0200)]

Prevent reference XML reader segfault on nullptr error strings

It is possible for TinyXML2 to return null pointers to the error strings.
This could cause XML reference data reader to segfault.

Refs #2241

Change-Id: I8b72917785080023f75388281dd2cbb4f30da925

commit | commitdiff | tree

Berk Hess [Fri, 8 Sep 2017 13:35:23 +0000 (15:35 +0200)]

Fix exception in SIMD LJ PME solve

Clear SIMD padding elements in solve helper arrays to avoid,
otherwise harmles, fp overflow exceptions.

Fixes #2242

Change-Id: I97e67c4fcc2ef361f54d1627fd0dab4621f4bd33

commit | commitdiff | tree

Berk Hess [Thu, 7 Sep 2017 17:52:29 +0000 (19:52 +0200)]

Do not pass state to init_replica_exchange

init_replica_exchange only used the state to check natoms,
so now we pass only that.

Change-Id: Ie75ca26b9899006f9ca4584c42391a50758c7ed5

commit | commitdiff | tree

Szilárd Páll [Mon, 11 Sep 2017 17:33:09 +0000 (19:33 +0200)]

Merge branch 'release-2016'

Note that changes to the simd_math module/tests from the 2016 branch
were omitted in favor of the current code in master.

Conflicts:
src/gromacs/fileio/oenv.cpp
src/gromacs/fileio/oenv.h
src/gromacs/gmxlib/nonbonded/CMakeLists.txt
src/gromacs/hardware/cpuinfo.cpp
src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda_kernel.cuh
src/gromacs/simd/simd_math.h
src/gromacs/simd/tests/simd_math.cpp
src/gromacs/swap/swapcoords.cpp

Change-Id: I357e40f97fd53a34ff900f40bb3fdeb20d864c13

commit | commitdiff | tree

Szilárd Páll [Mon, 4 Sep 2017 15:26:59 +0000 (17:26 +0200)]

NVIDIA Volta performance tweaks

Removed ballot syncs and replaced all computed masks with full warp
mask (as all branches in question are warp-synchronous).
This improves performance by 7-12%.

Change-Id: I769d6d8f0d171eb528d30868d567624d5e246dbf

commit | commitdiff | tree

Berk Hess [Fri, 8 Sep 2017 12:49:40 +0000 (14:49 +0200)]

Fix PBC bugs in the swap code

Fixes #2245

Change-Id: I90e2bed71a2499c63794e420dca383d91e6fc86c

commit | commitdiff | tree

Berk Hess [Sat, 22 Apr 2017 09:41:54 +0000 (11:41 +0200)]

Avoid inf in SIMD double sqrt()

Arguments >0 and <float_min to double precision SIMD sqrt()
would produce inf on many SIMD architectures. Now sqrt() will
return 0 for arguments in this range, which is not fully correct,
but should be unproblematic.
Updated the tests to check for this range and to produce output
that checks all double precision mantissa bits.

Fixes #2164.
Refs #2163.

Change-Id: Ic6d2c6d4102d602703b40e7e8bcc1974a7283f7c

commit | commitdiff | tree

Roland Schulz [Thu, 7 Sep 2017 00:40:06 +0000 (17:40 -0700)]

Fix compiler warning

Change-Id: Ic665baecec83f89bebb0fddea5385ae88c075adb

commit | commitdiff | tree

Mark Abraham [Mon, 4 Sep 2017 07:49:59 +0000 (09:49 +0200)]

Introduce type-safe GPU emulation variables

It's less clearly correct to pass around many variables of bool type,
as often happens in the high-level code.

Change-Id: I9ddb35fa8789e3eb5726fc0143a580493737bc29

commit | commitdiff | tree

Roland Schulz [Mon, 31 Jul 2017 23:56:00 +0000 (16:56 -0700)]

Add isIntegralConstant type trait

Add static assert to pme-gather to make overload safer

Change-Id: I71c67f3ee40185e31796752d090d2aa9f0918ec8

commit | commitdiff | tree

Mark Abraham [Wed, 6 Sep 2017 10:31:52 +0000 (12:31 +0200)]

Enable group-scheme SIMD kernels on recent AVX extensions

The group-scheme code only runs using the feature set of AVX_256, but
that is supported on the more recent hardware, so we should have the
group scheme run with the maximum suitable SIMD. Otherwise people will
need AVX_256 binaries for the group scheme, and other binaries for the
other support.

Change-Id: I7728e1d97998509368dfc456dc9c33489b6dfee5

commit | commitdiff | tree

Berk Hess [Thu, 7 Sep 2017 08:05:33 +0000 (10:05 +0200)]

Fix FEP state with rerun

When using FEP states with rerun, the FEP state was always 0.

Fixes #2244

Change-Id: I457bf444f6c7f8fd357212416311625981b833e6

commit | commitdiff | tree

Mark Abraham [Mon, 4 Sep 2017 21:53:49 +0000 (23:53 +0200)]

Prepare Verlet scheme later in runner

As noted in the TODO, preparing the Verlet scheme can wait until we
need rlist chosen so that we can set up the domain
decomposition. Moving it later in mdrun helps clarify the unrelated
process of setting up thread-MPI, and also means that the state used
when preparing the Verlet scheme is the one from the checkpoint file,
where applicable.

Moved a warning for the group scheme on BG/Q to a more appropriate
place.

Noted some TODOs for managing kinds of state better in future.

Change-Id: I99a014d9465b46b32038cfdcee99dc3896d2f265

commit | commitdiff | tree

Roland Schulz [Thu, 20 Jul 2017 22:07:49 +0000 (15:07 -0700)]

PME-gather: Use templated functor instead of preprocessor

Added restrict in several places, but this does not affect performance
with gcc and icc.

Change-Id: Id366621fa3ad02ca182b8a4da48cae940059cf46

commit | commitdiff | tree

Christoph Junghans [Sun, 13 Aug 2017 15:52:18 +0000 (09:52 -0600)]

Supported quiet trajectory-handling I/O

Permits GMX_TRAJECTORY_IO_VERBOSITY=0 to be set to keep frame-reading
code quiet, which is convenient for tools using libgromacs.

Change-Id: I873dcf229d9c20c8dd3b5097784236c2c2c478c1

commit | commitdiff | tree

Mark Abraham [Sun, 3 Sep 2017 13:59:12 +0000 (15:59 +0200)]

Fixes for Solaris

Suggested by Maureen Chew, of Oracle.

Change-Id: I5c3f868721944e7586b7001f3ffdf6ab17953526

commit | commitdiff | tree

Mark Abraham [Mon, 4 Sep 2017 21:27:04 +0000 (23:27 +0200)]

Fix use of GPU information for thread-MPI setup

Tweaked the logic for initially setting tryUsePhysicalGpu, since it
and forceUsePhysicalGpu are intended to be mutually exclusive. Note
that -gpu_id still works with both -nb auto and -nb gpu; what we
also need are some checks that -nb, -gpu_id and GMX_EMULATE_GPU
are mutually consistent, which is now true. We also don't need
to set tryUsePhysicalGpu based on GMX_GPU_NONE, since that will
take care of itself when no GPUs are detected.

Introduced nonbondedOnGpu to allow the logic that is aware of when
GPUs are supported / useful to set a variable that controls the
defaults for thread-MPI rank split, and npme choice. This restores
some of the approach removed in e27440a4923e, and fixes a bug that was
introduced there.

Change-Id: I42aa9e6c36fef584437ece9ef4d3a8a249cd58db

commit | commitdiff | tree

Erik Lindahl [Thu, 25 May 2017 13:05:36 +0000 (15:05 +0200)]

More SIMD math argument checking, added unsafe options

This change adds more argument checking and safeguards
for sqrt, exp2, and exp-related SIMD math functions, and
properly documents allowed values. These functions now
have an (optional) template parameter that makes it possible
to avoid the checks where it is important to save every cycle,
and the developer is certain that this usage is fine. For
now we only use the unsafe versions in the nonbonded kernels.
The SIMD function test code has also been extended with options
to allow denormals to be considered zero.

Fixes #2164.
Refs #2163.

Change-Id: I93ddadf74dd0fa013f61cf27fd1993f11cde28bc

commit | commitdiff | tree

Berk Hess [Wed, 9 Aug 2017 15:07:11 +0000 (17:07 +0200)]

Fix two nbnxn search indexing bugs

The conversion for nbnxn i-cluster indexing to j-cluster indexing
in the SIMD cluster list generation code was incorrect for the
(common) case where the j-cluster size is 8 (i is always 4).
This bug was compensated by a bug that caused the input j-cluster
range, in terms of i-cluster size, to be to wide by 1 at both ends.
This change fixes both bugs, which leads to slightly fewer distance
checks.
Note that because of the compensating bugs, the pair-lists have
always been fully correct.

Change-Id: I79ff95d71880dc53305dbbf9282487895be71ed9

commit | commitdiff | tree

Roland Schulz [Tue, 29 Aug 2017 21:11:01 +0000 (14:11 -0700)]

Fix accuracy for cvtDouble2Float/cprod

After commit b905792 gave incorrect results with ICC17u4

Change-Id: Ib945e979e9e3144342eed7a5cb4fc56d7f9a4b88

commit | commitdiff | tree

Berk Hess [Fri, 25 Aug 2017 20:25:18 +0000 (22:25 +0200)]

Consolidate architecture booleans

The booleans used in the hardware module for identifying architecures
are replaced by a single enum in archtecture.h. The duplicate
preprocessor code in cpuinfo.cpp now also uses this enum instead.
Note that hardwareinfo.cpp did not check for __i386, x86, __amd64__
and _M_AMD64, I don't know if this caused issues.

Also moved gpu_detec_res_str to new file gpu_hw_info.cpp.

Change-Id: I812482854d346c9290b0a428dada88072dc5a707

commit | commitdiff | tree

Aleksei Iupinov [Mon, 4 Sep 2017 11:13:40 +0000 (13:13 +0200)]

Merge "Merge branch release-5-1 into release-2016" into release-2016

commit | commitdiff | tree

Bernhard M. Wiedemann [Mon, 4 Sep 2017 09:32:31 +0000 (11:32 +0200)]

Use cmake's timestamp function

also use ISO date format and UTC timezone
to make it nicer to work with.

requires cmake-2.8.11+

Change-Id: I00fbf89c727624ad6ba89833ef146c2e16e2fee8

commit | commitdiff | tree

Teemu Murtola [Tue, 22 Aug 2017 19:12:04 +0000 (22:12 +0300)]

Introduce self-pairs search in nbsearch

Make it possible to search for all pairs within a single set of
positions using AnalysisNeighborhood. This effectively excludes half of
the pairs from the search, speeding things up.

Not used yet anywhere, but this makes the code a better reference for
performance comparisons, and for places where this is applicable it has
potential for speeding things up quite a bit.

Change-Id: Ib0e6f36460b8dbda97704447222c864c149d8e56

commit | commitdiff | tree

Teemu Murtola [Tue, 22 Aug 2017 19:09:49 +0000 (22:09 +0300)]

Refactor analysis nbsearch tests

Remove assumptions from the tests about the order in which pairs are
returned. Prepares the tests for an all-pairs search from a single set
of positions, where the order is not as predictable. And the test code
is actually easier to understand this way.

Change-Id: Id33eaff1c4c7f94a26099c6d4e34e7e008c1afa4

commit | commitdiff | tree

Mark Abraham [Sun, 3 Sep 2017 13:48:10 +0000 (15:48 +0200)]

Fix compiler flags for using MKL

Change-Id: I860e43da8de3b563167ecee1f52e2017a0f27f7f

commit | commitdiff | tree

Mark Abraham [Sun, 3 Sep 2017 14:51:20 +0000 (16:51 +0200)]

Merge branch release-5-1 into release-2016

Change-Id: I047ad1ab813b69582bdf5f4681549171724323f6

commit | commitdiff | tree

Berk Hess [Wed, 30 Aug 2017 21:14:01 +0000 (23:14 +0200)]

Put mdrun options in structs

Collects all mdrun options contained in flags, plus a few more,
into a new struct MdrunOptions with sub-structs.
All boolean options still use gmx_bool, and not bool, because
the command line parsing does not support bool yet.
Moved function declarations for expanded.cpp to expanded.h.
More options can be collected, but that would increase the size
of this change even more.

Change-Id: I21ea07b443e89cbfa21986bb8bd58a5657cbfbfa

commit | commitdiff | tree

M. Eric Irrgang [Wed, 30 Aug 2017 01:50:25 +0000 (21:50 -0400)]

Remove macro that confuses some debuggers

Replace FF macro with a lambda capturing a copy of Flags
to test for PCA flag bits set.

Change-Id: I9e87dc883d30c9c54115f2e47aa60d643f6061eb

commit | commitdiff | tree

Berk Hess [Wed, 30 Aug 2017 19:52:49 +0000 (21:52 +0200)]

Put all domdec options in a struct

Change-Id: I40c3cb979ea26188c4484138b1041f44ff9d5ad6

commit | commitdiff | tree

Berk Hess [Fri, 1 Sep 2017 21:10:34 +0000 (23:10 +0200)]

Fix mdrun -nb option pointer

Change-Id: I576a68161df6ebd071d58b8afef2268caac5524c

commit | commitdiff | tree

M. Eric Irrgang [Sun, 27 Aug 2017 01:34:09 +0000 (21:34 -0400)]

make a box parameter const

Change-Id: I6d9f472ac73f8d9fdde5b4dbd7ffdfc18544f634

commit | commitdiff | tree

Berk Hess [Fri, 1 Sep 2017 21:10:34 +0000 (23:10 +0200)]

Fix mdrun -nb option pointer

Change-Id: I576a68161df6ebd071d58b8afef2268caac5524c

commit | commitdiff | tree

Aleksei Iupinov [Thu, 31 Aug 2017 13:56:51 +0000 (15:56 +0200)]

Bring back space lost in commit e43e83bf

Change-Id: Ie1c644fd372954d16d8bb9a098191e3ac072a226

commit | commitdiff | tree

M. Eric Irrgang [Wed, 30 Aug 2017 01:41:56 +0000 (21:41 -0400)]

Add more git ignores

Make it easier to work with some editors and packagers.

Change-Id: I582f76d4e160c02c3614219805c7ccd12af4de55

commit | commitdiff | tree

M. Eric Irrgang [Wed, 30 Aug 2017 01:35:38 +0000 (21:35 -0400)]

normalize some C header includes

Change-Id: I26f10375d7021c01718c2b0d26251963509880e2

commit | commitdiff | tree

M. Eric Irrgang [Wed, 30 Aug 2017 01:46:52 +0000 (21:46 -0400)]

small precaution in case *argc == 0

Change-Id: Ia33ee57ea9143348a3ad4b7c3266ee9452dd32ee

commit | commitdiff | tree

M. Eric Irrgang [Wed, 30 Aug 2017 01:44:45 +0000 (21:44 -0400)]

Declare an input parameter const

Change-Id: I9fcb156256fa58853bc01f9b0c9fe2d08cf10057

commit | commitdiff | tree

Aleksei Iupinov [Tue, 29 Aug 2017 15:57:00 +0000 (17:57 +0200)]

Remove leftover function declaration

Commit a51b6feb removed gmx_check_hw_runconf_consistency(),
but not its declaration.

Change-Id: I82da91f735014ac131f8147950a2cb1114f73757

commit | commitdiff | tree

Aleksei Iupinov [Tue, 29 Aug 2017 15:15:35 +0000 (17:15 +0200)]

Fix typo

Change-Id: I94a630cd45f634f9e31478a330e28599b7a74100

commit | commitdiff | tree

Roland Schulz [Fri, 25 Aug 2017 01:26:34 +0000 (18:26 -0700)]

Enable missing declarations warning

Change-Id: I055e12554e40fea43d37590ea3989d0a8fbe732d

commit | commitdiff | tree

Szilárd Páll [Mon, 28 Aug 2017 15:26:38 +0000 (17:26 +0200)]

Fix typo and tweak env var docs

Change-Id: I78005b3fb4923ea4f81229640327d47e6c182303

commit | commitdiff | tree

Aleksei Iupinov [Fri, 25 Aug 2017 14:16:19 +0000 (16:16 +0200)]

Allow passing optional width argument into CUDA shuffle intrinsics

Change-Id: I207d8a7f94bf317e34ae4ff8cdb963fc96890260

commit | commitdiff | tree

Berk Hess [Tue, 8 Aug 2017 11:41:16 +0000 (13:41 +0200)]

Make pull potential registration thread-safe

Note that registration is currently not used and with a single
potential provider (such as AWH) there would be no issues.
This is for general thread safety.

Change-Id: I2e9be073655bfbe83cdc93a0f74b26e0b2e39dca

commit | commitdiff | tree

Berk Hess [Mon, 28 Aug 2017 15:28:18 +0000 (17:28 +0200)]

Set default GMX_OPENMP_MAX_THREADS to 64

As there are many new CPU with more than 32 hardware threads and
GROMACS scales quite well to more than 32 threads,
GMX_OPENMP_MAX_THREADS is increased from 32 to 64 threads.
The performance impact of this is that bitmasks are by default
64-bit instead of 32-but integers, which on 64-bit systems should
only have a (negligible) effect on cache pressure.

Change-Id: I73d1c79e86f30f7fc69e1f49e1195271435e77b6

commit | commitdiff | tree

Szilárd Páll [Mon, 28 Aug 2017 12:31:25 +0000 (14:31 +0200)]

Avoid division by zero is pruning part calculation

Added assert on manually set pruning interval.

Change-Id: I438f98a0a42335d8f79bf7604b53b51943e7db8c

commit | commitdiff | tree

Aleksei Iupinov [Thu, 24 Aug 2017 15:56:25 +0000 (17:56 +0200)]

Adjust PME LJ solver test input coefficient

The previous value caused one of the unit tests to fail predictably
on ARM_NEON SIMD. Coefficient was so low that it caused a specific
grid value to hover at the GMX_FLOAT_MIN threshold, which is used
to allow using same test reference data for single/double precision.

Ref #2234

Change-Id: Ia1aa51ead263e82487585abb167c4d080fd813ac

commit | commitdiff | tree

Roland Schulz [Thu, 24 Aug 2017 20:56:21 +0000 (13:56 -0700)]

Enable unused function warning

Change-Id: Id00efec9fd2e4edbae328cc291435c202b327492

commit | commitdiff | tree

Erik Lindahl [Thu, 25 May 2017 18:01:57 +0000 (20:01 +0200)]

Improved SIMD test data to use all bits

The SIMD test data now uses all available bits in the
current precision floating-point numbers, to catch errors
where we lose precision or have mistakes in single/double
conversions (e.g. return values). Since some operations now
depend on the least significant bit, a few tests have been
relaxed to allow a tolerance rather than require exact
binary matching.

The new tests caught a precision loss in a reduction
function for the SIMD-mimicking scalar functions, but this
has never been used apart from testing, and should be harmless.

Fixes #2163.

Change-Id: I6d8f19d2aafeee8f42a2034c6100fcb10f6d2e81

commit | commitdiff | tree

Berk Hess [Tue, 8 Aug 2017 13:19:09 +0000 (15:19 +0200)]

Avoid bonded forces loops/timers on inputs with no bondeds

This excludes near-zero "Bonded F" and "Listed buffer ops." subcounters
on such inputs.
Moved the bonded loop and thread parallelization to a separate function.
Also made the lambda input array const.

Change-Id: I862fc601e8adcf6d0b1eb7bd88390b0ea073e4fb

commit | commitdiff | tree

Mark Abraham [Fri, 25 Aug 2017 16:17:19 +0000 (18:17 +0200)]

Fix flat-bottomed position restraints with multiple ranks

Reallocation was never done for flat-bottomed restraints,
so the indexing could go out of range, leading to segfaults.

Fixes #2236

Change-Id: I866f96684fc5a2fef6391ed62a70abdaa1581a33

commit | commitdiff | tree

Mark Abraham [Mon, 21 Aug 2017 16:11:05 +0000 (18:11 +0200)]

Stopped checkpoint influencing parallelism setup

There's no strong reason to prefer that by default the parallelism
setup is the same as it was for the run in the checkpoint. If the next
run has the same number of ranks, then either we should follow the
user's instructions for the second run, or apply the defaults. This
permits us to fix bugs in the default settings for patch releases, and
have it work. It also prepares for future auto-tuning when we hope to
be able to adapt to the run-time environment, which will become
increasingly volatile, between and within runs. And of course it
simplifies the setup code slightly.

Change-Id: I41d65132f0855cd57a0a572d4579045bf10f295e

commit | commitdiff | tree

Mark Abraham [Mon, 21 Aug 2017 13:27:33 +0000 (15:27 +0200)]

Run GPU detection even for mdrun -nb cpu

The complexity of managing and documenting the setup-time optimization
of whether we do GPU detection in this case is not worth the small
advantage in the case where the user compiled a GPU binary and
required the use of CPUs.

Removed some comments that were either duplicates of the code or
likely to rot.

Change-Id: Ib9c234b55026981090fd064950e361eba21d198e

commit | commitdiff | tree

Mark Abraham [Tue, 22 Aug 2017 15:09:50 +0000 (17:09 +0200)]

Fixed thread-MPI with non-default -npme

Thread-MPI currently defaults to zero PME-only ranks, but should
support non-default specifications.

Enforced that e.g. mdrun -ntmpi 0 -npme 1 -gpu_id 0 is not
supported, because we don't have the ability to decide how
to distribute threads to the different kinds of ranks.

Change-Id: I5f175fc087c10d4268e6d8226ba1628e99d376fc

commit | commitdiff | tree

Mark Abraham [Tue, 22 Aug 2017 14:06:03 +0000 (16:06 +0200)]

Fix missing declaration of GMX_UNUSED_VALUE

Probably recent merge from release-2016 plus addition of pruning code,
perhaps following cleanup in master branch created a situation where a
source file included cuda_arch_utils.cuh without this transitive
include.

Change-Id: I7bdfe35f9655beceec8aafc030d6cf8fac4c71b2

commit | commitdiff | tree

Aleksei Iupinov [Mon, 21 Aug 2017 13:45:03 +0000 (15:45 +0200)]

Separate PME spread and gather wallcycle counters

Change-Id: If9d1bcac8b07d0ea09ac57c254e1ca30fbe78d31

commit | commitdiff | tree

Berk Hess [Fri, 24 Mar 2017 13:39:16 +0000 (14:39 +0100)]

Enable dynamic pair list pruning

This change activates the dynamic pruning scheme and the pruning
only kernels added in previous commits.
A heuristic estimate is used to select value for nstlist and
nstlistPrune that should result in performance that is reasonably
close to optimal. The nstlist increase code has been moved from
runner.cpp to nbnxn_tuning.cpp. The KNL check in that code has been
replaced by a check for Xeon Phi.
A paragraph has been added to the manual to describe the dynamic
and rolling list pruning scheme. A reference with all the details
will be added once the paper has been published.

Change-Id: Ic625858a07083916c8aa3e07f7497488dcfaee9e

commit | commitdiff | tree

Mark Abraham [Mon, 21 Aug 2017 23:51:09 +0000 (01:51 +0200)]

Merge branch release-2016

Made matching change for ARM NEON SIMD to newly refactored SIMD code

Trivial resolutions of adjacent changes in CUDA kernel code.

Adjacent resolutions for changes for dynamic pruning and disabling
of PME tuning for the group scheme need checking.

Change-Id: I024878fa50ba815960d00ad6e811af181323b4db

commit | commitdiff | tree

Teemu Murtola [Sat, 19 Aug 2017 18:59:05 +0000 (21:59 +0300)]

Simple union-find structure

Implement a simple data structure that, given a list of items
(represented by integer indices), keeps track of a partitioning of these
items into disjoint sets. Operations to merge two such sets (given two
member items) and to query the "index" of a set are amortized constant
time. This is a well-known "Union-Find" data structure.

Change-Id: I0bd907755d709776eac416332a4a1bcd139f135f

commit | commitdiff | tree

Mark Abraham [Wed, 12 Jul 2017 12:03:55 +0000 (14:03 +0200)]

Moved increase_nstlist() to nbnxn_tuning.h

No functionality changes, merely code movement. This prepares for a
child commit that changes how increase_nstlist() works.

Change-Id: Ide8ed1774d0ba13957698645cee6397cbc39d231

commit | commitdiff | tree

Aleksei Iupinov [Thu, 5 Jan 2017 13:49:47 +0000 (14:49 +0100)]

PME solving tests

Unit tests for PME solving on CPU. Test 2 grid sizes, 2 input grids,
normal and triclinic boxes, normal and LJ PME, 2 values of Ewald coefficients
and electrostatic parameter epsilon_r, with and without energy/virial compute.
Transformed grid (and possibly energy and virial gathered from the grid)
are tested as outputs.

Change-Id: I74c85b9d21e3ad30c8ad6c27c544690466ab3673

commit | commitdiff | tree

Mark Abraham [Thu, 13 Jul 2017 11:23:52 +0000 (11:23 +0000)]

Fix compilation issues with ARM SIMD

ARM_NEON has never supported double precision SIMD, so disabled it
with GROMACS double-precision build.

The maskzR* functions used the wrong argument order in the debug-mode
pre-masking (and sometimes in a typo-ed syntax).

In the shift operators, the clang-based compilers (including the
armclang v6 compiler series) seem to check that the required immediate
integer argument is given before inlining the call to the operator
function. The inlining seems to permit gcc to recognize that the
callers always use an immediate. In theory, the new code might
generate code that runs a trifle slower, but we don't use it at the
moment and the cost might be negligible if other effects dominate
performance.

Change-Id: I61dd4d906f7d5b77bc4e851cfaaaff059e5a67fe

commit | commitdiff | tree

Szilárd Páll [Wed, 29 Mar 2017 01:26:11 +0000 (03:26 +0200)]

Add OpenCL pruning kernels and launch/timing logic

The kernels have been tested for correction on NVIDIA >=CC 3.5 and AMD
GCN devices. Tuning for AMD has been done on the old fglrx stack which
has limitations on the intra-workgroup parallelism, so choice of the
j4 concurrency parameter should be revisited at some later stage (using
the latest AMDGPU-PRO and hopefully ROCm).

A number of possible improvements have been also identified and noted as
comments in nbnxn_ocl.cpp.

Change-Id: I7129ec247706d33317df1256846943ee8b0d540c

commit | commitdiff | tree

Szilárd Páll [Wed, 19 Jul 2017 11:31:58 +0000 (13:31 +0200)]

Improve awkward handling of GMX_CUDA_TARGET* help

The previous hack has the sole purpose of making the cache variable
appear and be documented only if set manually by the user. The better
way to do this is to simply use set_property().

Change-Id: I9ba24167302ca1bb9021231697e50ecf056ea15a

commit | commitdiff | tree

Vedran Miletić [Mon, 14 Aug 2017 15:33:31 +0000 (17:33 +0200)]

Improve the "files not present" error message

It's possible to use -deffnm in restarts even if it wasn't used in
the initial simulation. This can lead to absurd situations such as:

Expected output files not present or named differently:
pullx.xvg
pullf.xvg

where pullx.xvg and pullf.xvg are present and named exactly as listed,
but GROMACS expects them to be named as -deffnm requested.

The improved error message suggest to the user to check for that
possibility.

Refs #942 (partial workaround)

Change-Id: I983a7a2be791a634b877b0cbadb34e56a1ee2f82

commit | commitdiff | tree

Szilárd Páll [Tue, 28 Mar 2017 22:56:32 +0000 (00:56 +0200)]

Add CUDA pruning kernels and launch/timing logic

The CUDA prune kernels have been tuned and tested on CC 2.x-6.x
architectures. The tunable j4 concurrency parameter was chosen
identical for both kernel flavors irrespective the input size which
carries some tradeoffs (documented in the code). With auto-tuning,
such static choices could be revised.

Scheduling currently launches GPU prune kernels between steps, after the
force clearing. This has some drawbacks and more sophisticated
scheduling schemes might be beneficial in the future; these have been
documented along the declaration of nbnxn_gpu_launch_kernel_pruneonly();

Change-Id: Ia310ecafd7400efb33468d2486782c367a5a026a

commit | commitdiff | tree

Berk Hess [Fri, 24 Mar 2017 13:33:00 +0000 (14:33 +0100)]

Add CPU reference and SIMD pruning kernels

Change-Id: I416946adf67b8261b74093874b348270016585f1

Local GROMACS mirror with custom stuff

RSS Atom