python3 -m venv $HOME/myvenv
. $HOME/myvenv/bin/activate
python -m ensurepip --default-pip
- pip install --upgrade pip setuptools
- pip install --upgrade cmake scikit-build
+ pip install --upgrade pip setuptools wheel
.. seealso:: :ref:`gmxapi venv`
*gmxapi* comes in three parts:
* GROMACS gmxapi library for C++.
-* This Python package, supporting Python 3.6 and higher
+* This Python package, supporting Python 3.7 and higher
* MD restraint plugins and sample gmxapi client code
GROMACS requirements
Build system requirements
-------------------------
-gmxapi can be built for Python 3.6 and higher.
+gmxapi can be built for Python 3.7 and higher.
-You will need a C++ 14 compatible compiler and a reasonably up-to-date version
+You will need a C++ 17 compatible compiler and a reasonably up-to-date version
of CMake.
Full gmxapi functionality may also require an MPI compiler (e.g. :command:`mpicc`).
Python environment requirements
-------------------------------
-gmxapi requires Python 3.6 or higher. Check your version with
+gmxapi requires Python 3.7 or higher. Check your version with
:command:`python3 --version` or :command:`python --version`.
.. note::
:command:`python3 --version` or :command:`python --version` and :command:`pip --version`.
To build and install, you need the Python packages for
-cmake_, networkx_, scikit-build_, and setuptools_
+cmake_, networkx_, and setuptools_
(all available from `PyPI with pip <https://pip.pypa.io/en/stable/>`_).
For full functionality, you should also have mpi4py_ and numpy_.
Install dependencies
^^^^^^^^^^^^^^^^^^^^
-It is always a good idea to update pip_ and setuptools_ before installing
+It is always a good idea to update pip_, setuptools_, and wheel_ before installing
new Python packages::
pip install --upgrade pip setuptools
::
- pip install --upgrade cmake scikit-build
+ pip install --upgrade cmake pybind11
For MPI, we use mpi4py_.
Make sure it is using the same MPI installation that we are building
For example, the last line of the previous example could be replaced with::
- pip install --no-cache-dir --no-deps --no-index --no-build-isolation .
+ pip install --no-cache-dir --no-deps --no-index .
Refer to pip_ documentation for descriptions of these options.
-------------------------
A source archive for the gmxapi python package can be built from the GROMACS
-source repository using Python ``setuptools`` and ``scikit-build``.
+source repository using Python ``setuptools``.
Example::
- pip install --upgrade setuptools scikit-build
+ pip install --upgrade setuptools
cd python_packaging/src
python setup.py sdist
*<version>* is the version from the ``setup.py`` file, and *<suffix>* is
determined by the local environment or by additional arguments to ``setup.py``.
+The new `build <https://pypa-build.readthedocs.io/en/latest/>`__ module is somewhat tidier.
+It automatically manages a temporary venv with the necessary dependencies::
+
+ pip install --upgrade build
+ cd python_packaging/src
+ python -m build --sdist .
+
.. seealso::
Python documentation for
location in the build tree. Sphinx can then import the package to automatically
extract Python docstrings.
+Note that this is an entirely CMake-driven installation and Python dependencies
+will not be installed automatically. You can update your Python environment
+(before configuring with CMake) using the :file:`requirements.txt` files provided
+in the :file:`python_packaging/` directory of the repository. Example::
+
+ pip install -r python_packaging/requirements-docs.txt
+
+or
+
+::
+
+ pip install -r python_packaging/requirements-test.txt
+
Sometimes the build environment can choose a different Python interpreter than
the one you intended.
- You can set the ``Python3_ROOT`` or ``CMAKE_PREFIX_PATH`` CMake variable to
+ You can set the ``Python3_ROOT_DIR`` or ``CMAKE_PREFIX_PATH`` CMake variable to
explicitly choose the Python installation or *venv* directory.
If you use pyenv or pyenv-virtualenv to dynamically manage your Python version,
you can help identify a particular version with ``pyenv version-name`` and the
directory with ``pyenv prefix {version}``. For example::
- -DPython3_ROOT=$(pyenv prefix $(pyenv version-name))
+ -DPython3_ROOT_DIR=$(pyenv prefix $(pyenv version-name))
Docker web server
-----------------
(see above) to help narrow down the problem before you
`open an issue <https://gitlab.com/gromacs/gromacs/-/issues/>`_.
+Errors regarding pybind11
+-------------------------
+
+An error may occur in ``setup.py`` with output that contains something like the following::
+
+ ModuleNotFoundError: No module named 'pybind11'
+ Building wheel for gmxapi (pyproject.toml): finished with status 'error'
+ ERROR: Failed building wheel for gmxapi
+ Failed to build gmxapi
+ ERROR: Could not build wheels for gmxapi, which is required to install pyproject.toml-based projects
+
+The important information here is that ``pybind11`` was not found.
+
+Build dependencies aren't always automatically installed.
+Even if you are using ``pip``, you may have disabled automatic dependency fulfillment with an option like ``--no-build-isolation`` or ``--no-deps``.
+
+In any case, the problem should be resolved by explicitly installing the ``pybind11``
+Python package before attempting to build ``gmxapi``::
+
+ pip install --upgrade pybind11
+
Couldn't find the ``gmxapi`` support library?
---------------------------------------------
.. _scikit-build: https://pypi.org/project/scikit-build/
.. _setuptools: https://pypi.org/project/setuptools/
+
+.. _wheel: https://pypi.org/project/wheel/
#include "gromacs/utility/logger.h"
#include "gromacs/utility/real.h"
#include "gromacs/utility/smalloc.h"
-#include "gromacs/utility/stringutil.h"
+#include "gromacs/utility/message_string_collector.h"
#include "gromacs/utility/unique_cptr.h"
#include "calculate_spline_moduli.h"
#include "pme_spline_work.h"
#include "pme_spread.h"
-/*! \brief Help build a descriptive message in \c error if there are
- * \c errorReasons why PME on GPU is not supported.
- *
- * \returns Whether the lack of errorReasons indicate there is support. */
-static bool addMessageIfNotSupported(const std::list<std::string>& errorReasons, std::string* error)
-{
- bool isSupported = errorReasons.empty();
- if (!isSupported && error)
- {
- std::string regressionTestMarker = "PME GPU does not support";
- // this prefix is tested for in the regression tests script gmxtest.pl
- *error = regressionTestMarker;
- if (errorReasons.size() == 1)
- {
- *error += " " + errorReasons.back();
- }
- else
- {
- *error += ": " + gmx::joinStrings(errorReasons, "; ");
- }
- *error += ".";
- }
- return isSupported;
-}
+//NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
+bool g_allowPmeWithSyclForTesting = false;
bool pme_gpu_supports_build(std::string* error)
{
- std::list<std::string> errorReasons;
- if (GMX_DOUBLE)
- {
- errorReasons.emplace_back("a double-precision build");
- }
- if (!GMX_GPU)
- {
- errorReasons.emplace_back("a non-GPU build");
- }
- if (GMX_GPU_SYCL)
- {
- errorReasons.emplace_back("SYCL build"); // SYCL-TODO
- }
- return addMessageIfNotSupported(errorReasons, error);
+ gmx::MessageStringCollector errorReasons;
+ // Before changing the prefix string, make sure that it is not searched for in regression tests.
+ errorReasons.startContext("PME GPU does not support:");
+ errorReasons.appendIf(GMX_DOUBLE, "Double-precision build of GROMACS.");
+ errorReasons.appendIf(!GMX_GPU, "Non-GPU build of GROMACS.");
+ errorReasons.appendIf(GMX_GPU_SYCL && !g_allowPmeWithSyclForTesting, "SYCL build."); // SYCL-TODO
+ errorReasons.finishContext();
+ if (error != nullptr)
+ {
+ *error = errorReasons.toString();
+ }
+ return errorReasons.isEmpty();
}
bool pme_gpu_supports_hardware(const gmx_hw_info_t gmx_unused& hwinfo, std::string* error)
{
- std::list<std::string> errorReasons;
-
- if (GMX_GPU_OPENCL)
- {
+ gmx::MessageStringCollector errorReasons;
+ // Before changing the prefix string, make sure that it is not searched for in regression tests.
+ errorReasons.startContext("PME GPU does not support:");
#ifdef __APPLE__
- errorReasons.emplace_back("Apple OS X operating system");
+ errorReasons.appendIf(GMX_GPU_OPENCL, "Apple OS X operating system");
#endif
+ errorReasons.finishContext();
+ if (error != nullptr)
+ {
+ *error = errorReasons.toString();
}
- return addMessageIfNotSupported(errorReasons, error);
+ return errorReasons.isEmpty();
}
bool pme_gpu_supports_input(const t_inputrec& ir, std::string* error)
{
- std::list<std::string> errorReasons;
- if (!EEL_PME(ir.coulombtype))
- {
- errorReasons.emplace_back("systems that do not use PME for electrostatics");
- }
- if (ir.pme_order != 4)
- {
- errorReasons.emplace_back("interpolation orders other than 4");
- }
- if (EVDW_PME(ir.vdwtype))
- {
- errorReasons.emplace_back("Lennard-Jones PME");
- }
- if (!EI_DYNAMICS(ir.eI))
- {
- errorReasons.emplace_back(
- "Cannot compute PME interactions on a GPU, because PME GPU requires a dynamical "
- "integrator (md, sd, etc).");
- }
- return addMessageIfNotSupported(errorReasons, error);
+ gmx::MessageStringCollector errorReasons;
+ // Before changing the prefix string, make sure that it is not searched for in regression tests.
+ errorReasons.startContext("PME GPU does not support:");
+ errorReasons.appendIf(!EEL_PME(ir.coulombtype),
+ "Systems that do not use PME for electrostatics.");
+ errorReasons.appendIf((ir.pme_order != 4), "Interpolation orders other than 4.");
+ errorReasons.appendIf(EVDW_PME(ir.vdwtype), "Lennard-Jones PME.");
+ errorReasons.appendIf(!EI_DYNAMICS(ir.eI), "Non-dynamical integrator (use md, sd, etc).");
+ errorReasons.finishContext();
+ if (error != nullptr)
+ {
+ *error = errorReasons.toString();
+ }
+ return errorReasons.isEmpty();
}
- std::list<std::string> errorReasons;
- if (ir.efep != efepNO)
+ bool pme_gpu_mixed_mode_supports_input(const t_inputrec& ir, std::string* error)
+ {
- errorReasons.emplace_back("Free Energy Perturbation (in PME GPU mixed mode)");
++ gmx::MessageStringCollector errorReasons;
++ // Before changing the prefix string, make sure that it is not searched for in regression tests.
++ errorReasons.startContext("PME GPU in Mixed mode does not support:");
++ errorReasons.appendIf(ir.efep != FreeEnergyPerturbationType::No, "Free Energy Perturbation.");
++ errorReasons.finishContext();
++ if (error != nullptr)
+ {
- return addMessageIfNotSupported(errorReasons, error);
++ *error = errorReasons.toString();
+ }
++ return errorReasons.isEmpty();
+ }
+
/*! \brief \libinternal
* Finds out if PME with given inputs is possible to run on GPU.
* This function is an internal final check, validating the whole PME structure on creation,
*/
static bool pme_gpu_check_restrictions(const gmx_pme_t* pme, std::string* error)
{
- std::list<std::string> errorReasons;
- if (pme->nnodes != 1)
- {
- errorReasons.emplace_back("PME decomposition");
- }
- if (pme->pme_order != 4)
- {
- errorReasons.emplace_back("interpolation orders other than 4");
- }
- if (pme->doLJ)
- {
- errorReasons.emplace_back("Lennard-Jones PME");
- }
- if (GMX_DOUBLE)
- {
- errorReasons.emplace_back("double precision");
- }
- if (!GMX_GPU)
- {
- errorReasons.emplace_back("non-GPU build of GROMACS");
- }
- if (GMX_GPU_SYCL)
- {
- errorReasons.emplace_back("SYCL build of GROMACS"); // SYCL-TODO
- }
- return addMessageIfNotSupported(errorReasons, error);
+ gmx::MessageStringCollector errorReasons;
+ // Before changing the prefix string, make sure that it is not searched for in regression tests.
+ errorReasons.startContext("PME GPU does not support:");
+ errorReasons.appendIf((pme->nnodes != 1), "PME decomposition.");
+ errorReasons.appendIf((pme->pme_order != 4), "interpolation orders other than 4.");
+ errorReasons.appendIf(pme->doLJ, "Lennard-Jones PME.");
+ errorReasons.appendIf(GMX_DOUBLE, "Double precision build of GROMACS.");
+ errorReasons.appendIf(!GMX_GPU, "Non-GPU build of GROMACS.");
+ errorReasons.appendIf(GMX_GPU_SYCL && !g_allowPmeWithSyclForTesting, "SYCL build of GROMACS."); // SYCL-TODO
+ errorReasons.finishContext();
+ if (error != nullptr)
+ {
+ *error = errorReasons.toString();
+ }
+ return errorReasons.isEmpty();
}
PmeRunMode pme_run_mode(const gmx_pme_t* pme)
const int pmeOrder,
const int dimIndex,
const bool doSpread) :
- dimind(dimIndex),
- bSpread(doSpread),
- pme_order(pmeOrder),
- nthread(numThreads),
- spline(nthread)
+ dimind(dimIndex), bSpread(doSpread), pme_order(pmeOrder), nthread(numThreads), spline(nthread)
{
if (PmeMpiCommunicator != MPI_COMM_NULL)
{
MPI_Status stat;
for (size_t b = 0; b < ol->comm_data.size(); b++)
{
- MPI_Sendrecv(&ol->send_size, 1, MPI_INT, ol->comm_data[b].send_id, b, &ol->comm_data[b].recv_size,
- 1, MPI_INT, ol->comm_data[b].recv_id, b, ol->mpi_comm, &stat);
+ MPI_Sendrecv(&ol->send_size,
+ 1,
+ MPI_INT,
+ ol->comm_data[b].send_id,
+ b,
+ &ol->comm_data[b].recv_size,
+ 1,
+ MPI_INT,
+ ol->comm_data[b].recv_id,
+ b,
+ ol->mpi_comm,
+ &stat);
}
#endif
std::string message = gmx::formatString(
"pme_order (%d) is larger than the maximum allowed value (%d). Modify and "
"recompile the code if you really need such a high order.",
- pme_order, PME_ORDER_MAX);
+ pme_order,
+ PME_ORDER_MAX);
GMX_THROW(gmx::InconsistentInputError(message));
}
"threads, the number of grid lines per rank along x should be >= pme_order (%d) "
"or = pmeorder-1. To resolve this issue, use fewer ranks along x (and possibly "
"more along y and/or z) by specifying -dd manually.",
- nkx / static_cast<double>(numPmeDomainsAlongX), pme_order);
+ nkx / static_cast<double>(numPmeDomainsAlongX),
+ pme_order);
}
return true;
gmx::unique_cptr<gmx_pme_t, gmx_pme_destroy> pme(new gmx_pme_t());
- pme->sum_qgrid_tmp = nullptr;
- pme->sum_qgrid_dd_tmp = nullptr;
-
pme->buf_nalloc = 0;
pme->nnodes = 1;
pme->ndecompdim = 2;
#if GMX_MPI
- MPI_Comm_split(pme->mpi_comm, pme->nodeid % numPmeDomains.y, pme->nodeid,
+ MPI_Comm_split(pme->mpi_comm,
+ pme->nodeid % numPmeDomains.y,
+ pme->nodeid,
&pme->mpi_comm_d[0]); /* My communicator along major dimension */
- MPI_Comm_split(pme->mpi_comm, pme->nodeid / numPmeDomains.y, pme->nodeid,
+ MPI_Comm_split(pme->mpi_comm,
+ pme->nodeid / numPmeDomains.y,
+ pme->nodeid,
&pme->mpi_comm_d[1]); /* My communicator along minor dimension */
MPI_Comm_rank(pme->mpi_comm_d[0], &pme->nodeid_major);
* not calculating free-energy for Coulomb and/or LJ while gmx_pme_init()
* configures with free-energy, but that has never been tested.
*/
- pme->doCoulomb = EEL_PME(ir->coulombtype);
- pme->doLJ = EVDW_PME(ir->vdwtype);
- pme->bFEP_q = ((ir->efep != efepNO) && bFreeEnergy_q);
- pme->bFEP_lj = ((ir->efep != efepNO) && bFreeEnergy_lj);
- pme->bFEP = (pme->bFEP_q || pme->bFEP_lj);
- pme->nkx = ir->nkx;
- pme->nky = ir->nky;
- pme->nkz = ir->nkz;
- pme->bP3M = (ir->coulombtype == eelP3M_AD || getenv("GMX_PME_P3M") != nullptr);
+ pme->doCoulomb = EEL_PME(ir->coulombtype);
+ pme->doLJ = EVDW_PME(ir->vdwtype);
+ pme->bFEP_q = ((ir->efep != FreeEnergyPerturbationType::No) && bFreeEnergy_q);
+ pme->bFEP_lj = ((ir->efep != FreeEnergyPerturbationType::No) && bFreeEnergy_lj);
+ pme->bFEP = (pme->bFEP_q || pme->bFEP_lj);
+ pme->nkx = ir->nkx;
+ pme->nky = ir->nky;
+ pme->nkz = ir->nkz;
+ pme->bP3M = (ir->coulombtype == CoulombInteractionType::P3mAD || getenv("GMX_PME_P3M") != nullptr);
pme->pme_order = ir->pme_order;
pme->ewaldcoeff_q = ewaldcoeff_q;
pme->ewaldcoeff_lj = ewaldcoeff_lj;
// The box requires scaling with nwalls = 2, we store that condition as well
// as the scaling factor
- delete pme->boxScaler;
- pme->boxScaler = new EwaldBoxZScaler(*ir);
+ pme->boxScaler = std::make_unique<EwaldBoxZScaler>(
+ EwaldBoxZScaler(inputrecPbcXY2Walls(ir), ir->wall_ewald_zfac));
/* If we violate restrictions, generate a fatal error here */
- gmx_pme_check_restrictions(pme->pme_order, pme->nkx, pme->nky, pme->nkz, pme->nnodes_major,
- pme->bUseThreads, true);
+ gmx_pme_check_restrictions(
+ pme->pme_order, pme->nkx, pme->nky, pme->nkz, pme->nnodes_major, pme->bUseThreads, true);
if (pme->nnodes > 1)
{
" and PME grid_y (%d) and grid_z (%d) should be divisible by "
"#PME_ranks_y "
"(%d)",
- gmx::roundToInt((imbal - 1) * 100), pme->nkx, pme->nky,
- pme->nnodes_major, pme->nky, pme->nkz, pme->nnodes_minor);
+ gmx::roundToInt((imbal - 1) * 100),
+ pme->nkx,
+ pme->nky,
+ pme->nnodes_major,
+ pme->nky,
+ pme->nkz,
+ pme->nnodes_minor);
}
}
* y is always copied through a buffer: we don't need padding in z,
* but we do need the overlap in x because of the communication order.
*/
- init_overlap_comm(&pme->overlap[0], pme->pme_order, pme->mpi_comm_d[0], pme->nnodes_major,
- pme->nodeid_major, pme->nkx,
+ init_overlap_comm(&pme->overlap[0],
+ pme->pme_order,
+ pme->mpi_comm_d[0],
+ pme->nnodes_major,
+ pme->nodeid_major,
+ pme->nkx,
(div_round_up(pme->nky, pme->nnodes_minor) + pme->pme_order)
* (pme->nkz + pme->pme_order - 1));
* We do this with an offset buffer of equal size, so we need to allocate
* extra for the offset. That's what the (+1)*pme->nkz is for.
*/
- init_overlap_comm(&pme->overlap[1], pme->pme_order, pme->mpi_comm_d[1], pme->nnodes_minor,
- pme->nodeid_minor, pme->nky,
+ init_overlap_comm(&pme->overlap[1],
+ pme->pme_order,
+ pme->mpi_comm_d[1],
+ pme->nnodes_minor,
+ pme->nodeid_minor,
+ pme->nky,
(div_round_up(pme->nkx, pme->nnodes_major) + pme->pme_order + 1) * pme->nkz);
/* Double-check for a limitation of the (current) sum_fftgrid_dd code.
pme->pmegrid_start_iy = pme->overlap[1].s2g0[pme->nodeid_minor];
pme->pmegrid_start_iz = 0;
- make_gridindex_to_localindex(pme->nkx, pme->pmegrid_start_ix,
- pme->pmegrid_nx - (pme->pme_order - 1), &pme->nnx, &pme->fshx);
- make_gridindex_to_localindex(pme->nky, pme->pmegrid_start_iy,
- pme->pmegrid_ny - (pme->pme_order - 1), &pme->nny, &pme->fshy);
- make_gridindex_to_localindex(pme->nkz, pme->pmegrid_start_iz, pme->pmegrid_nz_base, &pme->nnz,
- &pme->fshz);
+ make_gridindex_to_localindex(
+ pme->nkx, pme->pmegrid_start_ix, pme->pmegrid_nx - (pme->pme_order - 1), &pme->nnx, &pme->fshx);
+ make_gridindex_to_localindex(
+ pme->nky, pme->pmegrid_start_iy, pme->pmegrid_ny - (pme->pme_order - 1), &pme->nny, &pme->fshy);
+ make_gridindex_to_localindex(
+ pme->nkz, pme->pmegrid_start_iz, pme->pmegrid_nz_base, &pme->nnz, &pme->fshz);
pme->spline_work = make_pme_spline_work(pme->pme_order);
*/
if (pme->doLJ)
{
- pme->ngrids = ((ir->ljpme_combination_rule == eljpmeLB) ? DO_Q_AND_LJ_LB : DO_Q_AND_LJ);
+ pme->ngrids = ((ir->ljpme_combination_rule == LongRangeVdW::LB) ? DO_Q_AND_LJ_LB : DO_Q_AND_LJ);
}
else
{
{
if ((i < DO_Q && pme->doCoulomb && (i == 0 || bFreeEnergy_q))
|| (i >= DO_Q && pme->doLJ
- && (i == 2 || bFreeEnergy_lj || ir->ljpme_combination_rule == eljpmeLB)))
+ && (i == 2 || bFreeEnergy_lj || ir->ljpme_combination_rule == LongRangeVdW::LB)))
{
- pmegrids_init(&pme->pmegrid[i], pme->pmegrid_nx, pme->pmegrid_ny, pme->pmegrid_nz,
- pme->pmegrid_nz_base, pme->pme_order, pme->bUseThreads, pme->nthread,
+ pmegrids_init(&pme->pmegrid[i],
+ pme->pmegrid_nx,
+ pme->pmegrid_ny,
+ pme->pmegrid_nz,
+ pme->pmegrid_nz_base,
+ pme->pme_order,
+ pme->bUseThreads,
+ pme->nthread,
pme->overlap[0].s2g1[pme->nodeid_major]
- pme->overlap[0].s2g0[pme->nodeid_major + 1],
pme->overlap[1].s2g1[pme->nodeid_minor]
const auto allocateRealGridForGpu = (pme->runMode == PmeRunMode::Mixed)
? gmx::PinningPolicy::PinnedIfSupported
: gmx::PinningPolicy::CannotBePinned;
- gmx_parallel_3dfft_init(&pme->pfft_setup[i], ndata, &pme->fftgrid[i], &pme->cfftgrid[i],
- pme->mpi_comm_d, bReproducible, pme->nthread, allocateRealGridForGpu);
+ gmx_parallel_3dfft_init(&pme->pfft_setup[i],
+ ndata,
+ &pme->fftgrid[i],
+ &pme->cfftgrid[i],
+ pme->mpi_comm_d,
+ bReproducible,
+ pme->nthread,
+ allocateRealGridForGpu);
}
}
const gmx::MDLogger dummyLogger;
GMX_ASSERT(pmedata, "Invalid PME pointer");
NumPmeDomains numPmeDomains = { pme_src->nnodes_major, pme_src->nnodes_minor };
- *pmedata = gmx_pme_init(cr, numPmeDomains, &irc, pme_src->bFEP_q, pme_src->bFEP_lj, FALSE,
- ewaldcoeff_q, ewaldcoeff_lj, pme_src->nthread, pme_src->runMode,
- pme_src->gpu, nullptr, nullptr, nullptr, dummyLogger);
+ *pmedata = gmx_pme_init(cr,
+ numPmeDomains,
+ &irc,
+ pme_src->bFEP_q,
+ pme_src->bFEP_lj,
+ FALSE,
+ ewaldcoeff_q,
+ ewaldcoeff_lj,
+ pme_src->nthread,
+ pme_src->runMode,
+ pme_src->gpu,
+ nullptr,
+ nullptr,
+ nullptr,
+ dummyLogger);
/* When running PME on the CPU not using domain decomposition,
* the atom data is allocated once only in gmx_pme_(re)init().
*/
if (!pme_src->gpu && pme_src->nnodes == 1)
{
- gmx_pme_reinit_atoms(*pmedata, pme_src->atc[0].numAtoms(), nullptr, nullptr);
+ gmx_pme_reinit_atoms(*pmedata, pme_src->atc[0].numAtoms(), {}, {});
}
// TODO this is mostly passing around current values
}
/* We would like to reuse the fft grids, but that's harder */
}
-void gmx_pme_calc_energy(gmx_pme_t* pme, gmx::ArrayRef<const gmx::RVec> x, gmx::ArrayRef<const real> q, real* V)
+real gmx_pme_calc_energy(gmx_pme_t* pme, gmx::ArrayRef<const gmx::RVec> x, gmx::ArrayRef<const real> q)
{
pmegrids_t* grid;
/* Only calculate the spline coefficients, don't actually spread */
spread_on_grid(pme, atc, nullptr, TRUE, FALSE, pme->fftgrid[PME_GRID_QA], FALSE, PME_GRID_QA);
- *V = gather_energy_bsplines(pme, grid->grid.grid, atc);
+ return gather_energy_bsplines(pme, grid->grid.grid, atc);
}
/*! \brief Calculate initial Lorentz-Berthelot coefficients for LJ-PME */
-static void calc_initial_lb_coeffs(gmx::ArrayRef<real> coefficient, const real* local_c6, const real* local_sigma)
+static void calc_initial_lb_coeffs(gmx::ArrayRef<real> coefficient,
+ gmx::ArrayRef<const real> local_c6,
+ gmx::ArrayRef<const real> local_sigma)
{
for (gmx::index i = 0; i < coefficient.ssize(); ++i)
{
}
/*! \brief Calculate next Lorentz-Berthelot coefficients for LJ-PME */
-static void calc_next_lb_coeffs(gmx::ArrayRef<real> coefficient, const real* local_sigma)
+static void calc_next_lb_coeffs(gmx::ArrayRef<real> coefficient, gmx::ArrayRef<const real> local_sigma)
{
for (gmx::index i = 0; i < coefficient.ssize(); ++i)
{
int gmx_pme_do(struct gmx_pme_t* pme,
gmx::ArrayRef<const gmx::RVec> coordinates,
gmx::ArrayRef<gmx::RVec> forces,
- real chargeA[],
- real chargeB[],
- real c6A[],
- real c6B[],
- real sigmaA[],
- real sigmaB[],
+ gmx::ArrayRef<const real> chargeA,
+ gmx::ArrayRef<const real> chargeB,
+ gmx::ArrayRef<const real> c6A,
+ gmx::ArrayRef<const real> c6B,
+ gmx::ArrayRef<const real> sigmaA,
+ gmx::ArrayRef<const real> sigmaB,
const matrix box,
const t_commrec* cr,
int maxshift_x,
GMX_ASSERT(pme->runMode == PmeRunMode::CPU,
"gmx_pme_do should not be called on the GPU PME run.");
- int d, npme, grid_index, max_grid_index;
- PmeAtomComm& atc = pme->atc[0];
- pmegrids_t* pmegrid = nullptr;
- real* grid = nullptr;
- real* coefficient = nullptr;
- PmeOutput output[2]; // The second is used for the B state with FEP
- real scale, lambda;
- gmx_bool bClearF;
- gmx_parallel_3dfft_t pfft_setup;
- real* fftgrid;
- t_complex* cfftgrid;
- int thread;
- gmx_bool bFirst, bDoSplines;
- int fep_state;
- int fep_states_lj = pme->bFEP_lj ? 2 : 1;
+ PmeAtomComm& atc = pme->atc[0];
+ pmegrids_t* pmegrid = nullptr;
+ real* grid = nullptr;
+ gmx::ArrayRef<const real> coefficient;
+ std::array<PmeOutput, 2> output; // The second is used for the B state with FEP
+ gmx_parallel_3dfft_t pfft_setup;
+ real* fftgrid;
+ t_complex* cfftgrid;
+ int thread;
+ const int fep_states_lj = pme->bFEP_lj ? 2 : 1;
// There's no support for computing energy without virial, or vice versa
const bool computeEnergyAndVirial = (stepWork.computeEnergy || stepWork.computeVirial);
pme->boxScaler->scaleBox(box, scaledBox);
gmx::invertBoxMatrix(scaledBox, pme->recipbox);
- bFirst = TRUE;
+ bool bFirst = true;
/* For simplicity, we construct the splines for all particles if
* more than one PME calculations is needed. Some optimization
* that don't yet have them.
*/
- bDoSplines = pme->bFEP || (pme->doCoulomb && pme->doLJ);
+ bool bDoSplines = pme->bFEP || (pme->doCoulomb && pme->doLJ);
/* We need a maximum of four separate PME calculations:
* grid_index=0: Coulomb PME with charges from state A
*/
/* If we are doing LJ-PME with LB, we only do Q here */
- max_grid_index = (pme->ljpme_combination_rule == eljpmeLB) ? DO_Q : DO_Q_AND_LJ;
-
- for (grid_index = 0; grid_index < max_grid_index; ++grid_index)
+ const int max_grid_index = (pme->ljpme_combination_rule == LongRangeVdW::LB) ? DO_Q : DO_Q_AND_LJ;
+ bool bClearF;
+ for (int grid_index = 0; grid_index < max_grid_index; ++grid_index)
{
/* Check if we should do calculations at this grid_index
* If grid_index is odd we should be doing FEP
grid = pmegrid->grid.grid;
- if (debug)
- {
- fprintf(debug, "PME: number of ranks = %d, rank = %d\n", cr->nnodes, cr->nodeid);
- fprintf(debug, "Grid = %p\n", static_cast<void*>(grid));
- if (grid == nullptr)
- {
- gmx_fatal(FARGS, "No grid!");
- }
- }
-
if (pme->nnodes == 1)
{
- atc.coefficient = gmx::arrayRefFromArray(coefficient, coordinates.size());
+ atc.coefficient = coefficient;
}
else
{
- wallcycle_start(wcycle, ewcPME_REDISTXF);
+ wallcycle_start(wcycle, WallCycleCounter::PmeRedistXF);
do_redist_pos_coeffs(pme, cr, bFirst, coordinates, coefficient);
- wallcycle_stop(wcycle, ewcPME_REDISTXF);
+ wallcycle_stop(wcycle, WallCycleCounter::PmeRedistXF);
}
- if (debug)
- {
- fprintf(debug, "Rank= %6d, pme local particles=%6d\n", cr->nodeid, atc.numAtoms());
- }
-
- wallcycle_start(wcycle, ewcPME_SPREAD);
+ wallcycle_start(wcycle, WallCycleCounter::PmeSpread);
/* Spread the coefficients on a grid */
spread_on_grid(pme, &atc, pmegrid, bFirst, TRUE, fftgrid, bDoSplines, grid_index);
copy_pmegrid_to_fftgrid(pme, grid, fftgrid, grid_index);
}
- wallcycle_stop(wcycle, ewcPME_SPREAD);
+ wallcycle_stop(wcycle, WallCycleCounter::PmeSpread);
/* TODO If the OpenMP and single-threaded implementations
converge, then spread_on_grid() and
/* do 3d-fft */
if (thread == 0)
{
- wallcycle_start(wcycle, ewcPME_FFT);
+ wallcycle_start(wcycle, WallCycleCounter::PmeFft);
}
gmx_parallel_3dfft_execute(pfft_setup, GMX_FFT_REAL_TO_COMPLEX, thread, wcycle);
if (thread == 0)
{
- wallcycle_stop(wcycle, ewcPME_FFT);
+ wallcycle_stop(wcycle, WallCycleCounter::PmeFft);
}
/* solve in k-space for our local cells */
if (thread == 0)
{
- wallcycle_start(wcycle, (grid_index < DO_Q ? ewcPME_SOLVE : ewcLJPME));
+ wallcycle_start(
+ wcycle,
+ (grid_index < DO_Q ? WallCycleCounter::PmeSolve : WallCycleCounter::LJPme));
}
if (grid_index < DO_Q)
{
- loop_count = solve_pme_yzx(
- pme, cfftgrid, scaledBox[XX][XX] * scaledBox[YY][YY] * scaledBox[ZZ][ZZ],
- computeEnergyAndVirial, pme->nthread, thread);
+ loop_count = solve_pme_yzx(pme,
+ cfftgrid,
+ scaledBox[XX][XX] * scaledBox[YY][YY] * scaledBox[ZZ][ZZ],
+ computeEnergyAndVirial,
+ pme->nthread,
+ thread);
}
else
{
loop_count =
- solve_pme_lj_yzx(pme, &cfftgrid, FALSE,
+ solve_pme_lj_yzx(pme,
+ &cfftgrid,
+ FALSE,
scaledBox[XX][XX] * scaledBox[YY][YY] * scaledBox[ZZ][ZZ],
- computeEnergyAndVirial, pme->nthread, thread);
+ computeEnergyAndVirial,
+ pme->nthread,
+ thread);
}
if (thread == 0)
{
- wallcycle_stop(wcycle, (grid_index < DO_Q ? ewcPME_SOLVE : ewcLJPME));
+ wallcycle_stop(
+ wcycle,
+ (grid_index < DO_Q ? WallCycleCounter::PmeSolve : WallCycleCounter::LJPme));
inc_nrnb(nrnb, eNR_SOLVEPME, loop_count);
}
/* do 3d-invfft */
if (thread == 0)
{
- wallcycle_start(wcycle, ewcPME_FFT);
+ wallcycle_start(wcycle, WallCycleCounter::PmeFft);
}
gmx_parallel_3dfft_execute(pfft_setup, GMX_FFT_COMPLEX_TO_REAL, thread, wcycle);
if (thread == 0)
{
- wallcycle_stop(wcycle, ewcPME_FFT);
+ wallcycle_stop(wcycle, WallCycleCounter::PmeFft);
if (pme->nodeid == 0)
{
- real ntot = pme->nkx * pme->nky * pme->nkz;
- npme = static_cast<int>(ntot * std::log(ntot) / std::log(2.0));
+ real ntot = pme->nkx * pme->nky * pme->nkz;
+ const int npme = static_cast<int>(ntot * std::log(ntot) / std::log(2.0));
inc_nrnb(nrnb, eNR_FFT, 2 * npme);
}
/* Note: this wallcycle region is closed below
outside an OpenMP region, so take care if
refactoring code here. */
- wallcycle_start(wcycle, ewcPME_GATHER);
+ wallcycle_start(wcycle, WallCycleCounter::PmeGather);
}
copy_fftgrid_to_pmegrid(pme, fftgrid, grid, grid_index, pme->nthread, thread);
* atc->f is the actual force array, not a buffer,
* therefore we should not clear it.
*/
- lambda = grid_index < DO_Q ? lambda_q : lambda_lj;
- bClearF = (bFirst && PAR(cr));
+ real lambda = grid_index < DO_Q ? lambda_q : lambda_lj;
+ bClearF = (bFirst && PAR(cr));
#pragma omp parallel for num_threads(pme->nthread) schedule(static)
for (thread = 0; thread < pme->nthread; thread++)
{
try
{
- gather_f_bsplines(pme, grid, bClearF, &atc, &atc.spline[thread],
+ gather_f_bsplines(pme,
+ grid,
+ bClearF,
+ &atc,
+ &atc.spline[thread],
pme->bFEP ? (grid_index % 2 == 0 ? 1.0 - lambda : lambda) : 1.0);
}
GMX_CATCH_ALL_AND_EXIT_WITH_FATAL_ERROR
}
- inc_nrnb(nrnb, eNR_GATHERFBSP,
- pme->pme_order * pme->pme_order * pme->pme_order * atc.numAtoms());
+ inc_nrnb(nrnb, eNR_GATHERFBSP, pme->pme_order * pme->pme_order * pme->pme_order * atc.numAtoms());
/* Note: this wallcycle region is opened above inside an OpenMP
region, so take care if refactoring code here. */
- wallcycle_stop(wcycle, ewcPME_GATHER);
+ wallcycle_stop(wcycle, WallCycleCounter::PmeGather);
}
if (computeEnergyAndVirial)
/* For Lorentz-Berthelot combination rules in LJ-PME, we need to calculate
* seven terms. */
- if (pme->doLJ && pme->ljpme_combination_rule == eljpmeLB)
+ if (pme->doLJ && pme->ljpme_combination_rule == LongRangeVdW::LB)
{
/* Loop over A- and B-state if we are doing FEP */
- for (fep_state = 0; fep_state < fep_states_lj; ++fep_state)
+ for (int fep_state = 0; fep_state < fep_states_lj; ++fep_state)
{
- real *local_c6 = nullptr, *local_sigma = nullptr, *RedistC6 = nullptr, *RedistSigma = nullptr;
- gmx::ArrayRef<real> coefficientBuffer;
+ std::vector<real> local_c6;
+ std::vector<real> local_sigma;
+ gmx::ArrayRef<const real> RedistC6;
+ gmx::ArrayRef<const real> RedistSigma;
+ gmx::ArrayRef<real> coefficientBuffer;
if (pme->nnodes == 1)
{
pme->lb_buf1.resize(atc.numAtoms());
switch (fep_state)
{
case 0:
- local_c6 = c6A;
- local_sigma = sigmaA;
+ local_c6.assign(c6A.begin(), c6A.end());
+ local_sigma.assign(sigmaA.begin(), sigmaA.end());
break;
case 1:
- local_c6 = c6B;
- local_sigma = sigmaB;
+ local_c6.assign(c6B.begin(), c6B.end());
+ local_sigma.assign(sigmaB.begin(), sigmaB.end());
break;
default: gmx_incons("Trying to access wrong FEP-state in LJ-PME routine");
}
break;
default: gmx_incons("Trying to access wrong FEP-state in LJ-PME routine");
}
- wallcycle_start(wcycle, ewcPME_REDISTXF);
+ wallcycle_start(wcycle, WallCycleCounter::PmeRedistXF);
do_redist_pos_coeffs(pme, cr, bFirst, coordinates, RedistC6);
pme->lb_buf1.resize(atc.numAtoms());
pme->lb_buf2.resize(atc.numAtoms());
- local_c6 = pme->lb_buf1.data();
+ local_c6.assign(pme->lb_buf1.begin(), pme->lb_buf1.end());
for (int i = 0; i < atc.numAtoms(); ++i)
{
local_c6[i] = atc.coefficient[i];
}
do_redist_pos_coeffs(pme, cr, FALSE, coordinates, RedistSigma);
- local_sigma = pme->lb_buf2.data();
+ local_sigma.assign(pme->lb_buf2.begin(), pme->lb_buf2.end());
for (int i = 0; i < atc.numAtoms(); ++i)
{
local_sigma[i] = atc.coefficient[i];
}
- wallcycle_stop(wcycle, ewcPME_REDISTXF);
+ wallcycle_stop(wcycle, WallCycleCounter::PmeRedistXF);
}
atc.coefficient = coefficientBuffer;
calc_initial_lb_coeffs(coefficientBuffer, local_c6, local_sigma);
/*Seven terms in LJ-PME with LB, grid_index < 2 reserved for electrostatics*/
- for (grid_index = 2; grid_index < 9; ++grid_index)
+ for (int grid_index = 2; grid_index < 9; ++grid_index)
{
/* Unpack structure */
pmegrid = &pme->pmegrid[grid_index];
calc_next_lb_coeffs(coefficientBuffer, local_sigma);
grid = pmegrid->grid.grid;
- wallcycle_start(wcycle, ewcPME_SPREAD);
+ wallcycle_start(wcycle, WallCycleCounter::PmeSpread);
/* Spread the c6 on a grid */
spread_on_grid(pme, &atc, pmegrid, bFirst, TRUE, fftgrid, bDoSplines, grid_index);
inc_nrnb(nrnb, eNR_WEIGHTS, DIM * atc.numAtoms());
}
- inc_nrnb(nrnb, eNR_SPREADBSP,
+ inc_nrnb(nrnb,
+ eNR_SPREADBSP,
pme->pme_order * pme->pme_order * pme->pme_order * atc.numAtoms());
if (pme->nthread == 1)
{
}
copy_pmegrid_to_fftgrid(pme, grid, fftgrid, grid_index);
}
- wallcycle_stop(wcycle, ewcPME_SPREAD);
+ wallcycle_stop(wcycle, WallCycleCounter::PmeSpread);
/*Here we start a large thread parallel region*/
#pragma omp parallel num_threads(pme->nthread) private(thread)
/* do 3d-fft */
if (thread == 0)
{
- wallcycle_start(wcycle, ewcPME_FFT);
+ wallcycle_start(wcycle, WallCycleCounter::PmeFft);
}
gmx_parallel_3dfft_execute(pfft_setup, GMX_FFT_REAL_TO_COMPLEX, thread, wcycle);
if (thread == 0)
{
- wallcycle_stop(wcycle, ewcPME_FFT);
+ wallcycle_stop(wcycle, WallCycleCounter::PmeFft);
}
}
GMX_CATCH_ALL_AND_EXIT_WITH_FATAL_ERROR
thread = gmx_omp_get_thread_num();
if (thread == 0)
{
- wallcycle_start(wcycle, ewcLJPME);
+ wallcycle_start(wcycle, WallCycleCounter::LJPme);
}
loop_count =
- solve_pme_lj_yzx(pme, &pme->cfftgrid[2], TRUE,
+ solve_pme_lj_yzx(pme,
+ &pme->cfftgrid[2],
+ TRUE,
scaledBox[XX][XX] * scaledBox[YY][YY] * scaledBox[ZZ][ZZ],
- computeEnergyAndVirial, pme->nthread, thread);
+ computeEnergyAndVirial,
+ pme->nthread,
+ thread);
if (thread == 0)
{
- wallcycle_stop(wcycle, ewcLJPME);
+ wallcycle_stop(wcycle, WallCycleCounter::LJPme);
inc_nrnb(nrnb, eNR_SOLVEPME, loop_count);
}
}
bFirst = !pme->doCoulomb;
calc_initial_lb_coeffs(coefficientBuffer, local_c6, local_sigma);
- for (grid_index = 8; grid_index >= 2; --grid_index)
+ for (int grid_index = 8; grid_index >= 2; --grid_index)
{
/* Unpack structure */
pmegrid = &pme->pmegrid[grid_index];
/* do 3d-invfft */
if (thread == 0)
{
- wallcycle_start(wcycle, ewcPME_FFT);
+ wallcycle_start(wcycle, WallCycleCounter::PmeFft);
}
gmx_parallel_3dfft_execute(pfft_setup, GMX_FFT_COMPLEX_TO_REAL, thread, wcycle);
if (thread == 0)
{
- wallcycle_stop(wcycle, ewcPME_FFT);
+ wallcycle_stop(wcycle, WallCycleCounter::PmeFft);
if (pme->nodeid == 0)
{
- real ntot = pme->nkx * pme->nky * pme->nkz;
- npme = static_cast<int>(ntot * std::log(ntot) / std::log(2.0));
+ real ntot = pme->nkx * pme->nky * pme->nkz;
+ const int npme = static_cast<int>(ntot * std::log(ntot) / std::log(2.0));
inc_nrnb(nrnb, eNR_FFT, 2 * npme);
}
- wallcycle_start(wcycle, ewcPME_GATHER);
+ wallcycle_start(wcycle, WallCycleCounter::PmeGather);
}
copy_fftgrid_to_pmegrid(pme, fftgrid, grid, grid_index, pme->nthread, thread);
if (stepWork.computeForces)
{
/* interpolate forces for our local atoms */
- bClearF = (bFirst && PAR(cr));
- scale = pme->bFEP ? (fep_state < 1 ? 1.0 - lambda_lj : lambda_lj) : 1.0;
+ bClearF = (bFirst && PAR(cr));
+ real scale = pme->bFEP ? (fep_state < 1 ? 1.0 - lambda_lj : lambda_lj) : 1.0;
scale *= lb_scale_factor[grid_index - 2];
#pragma omp parallel for num_threads(pme->nthread) schedule(static)
{
try
{
- gather_f_bsplines(pme, grid, bClearF, &pme->atc[0],
- &pme->atc[0].spline[thread], scale);
+ gather_f_bsplines(
+ pme, grid, bClearF, &pme->atc[0], &pme->atc[0].spline[thread], scale);
}
GMX_CATCH_ALL_AND_EXIT_WITH_FATAL_ERROR
}
- inc_nrnb(nrnb, eNR_GATHERFBSP,
+ inc_nrnb(nrnb,
+ eNR_GATHERFBSP,
pme->pme_order * pme->pme_order * pme->pme_order * pme->atc[0].numAtoms());
}
- wallcycle_stop(wcycle, ewcPME_GATHER);
+ wallcycle_stop(wcycle, WallCycleCounter::PmeGather);
bFirst = FALSE;
} /* for (grid_index = 8; grid_index >= 2; --grid_index) */
} /* for (fep_state = 0; fep_state < fep_states_lj; ++fep_state) */
- } /* if (pme->doLJ && pme->ljpme_combination_rule == eljpmeLB) */
+ } /* if (pme->doLJ && pme->ljpme_combination_rule == LongRangeVdW::LB) */
if (stepWork.computeForces && pme->nnodes > 1)
{
- wallcycle_start(wcycle, ewcPME_REDISTXF);
- for (d = 0; d < pme->ndecompdim; d++)
+ wallcycle_start(wcycle, WallCycleCounter::PmeRedistXF);
+ for (int d = 0; d < pme->ndecompdim; d++)
{
gmx::ArrayRef<gmx::RVec> forcesRef;
if (d == pme->ndecompdim - 1)
{
forcesRef = pme->atc[d + 1].f;
}
- if (DOMAINDECOMP(cr))
+ if (haveDDAtomOrdering(*cr))
{
dd_pmeredist_f(pme, &pme->atc[d], forcesRef, d == pme->ndecompdim - 1 && pme->bPPnode);
}
}
- wallcycle_stop(wcycle, ewcPME_REDISTXF);
+ wallcycle_stop(wcycle, WallCycleCounter::PmeRedistXF);
}
if (computeEnergyAndVirial)
}
}
}
- if (debug)
- {
- fprintf(debug, "Electrostatic PME mesh energy: %g\n", *energy_q);
- }
}
else
{
}
}
}
- if (debug)
- {
- fprintf(debug, "Lennard-Jones PME mesh energy: %g\n", *energy_lj);
- }
}
else
{
return;
}
- delete pme->boxScaler;
-
sfree(pme->nnx);
sfree(pme->nny);
sfree(pme->nnz);
pme_free_all_work(&pme->solve_work, pme->nthread);
}
- sfree(pme->sum_qgrid_tmp);
- sfree(pme->sum_qgrid_dd_tmp);
-
destroy_pme_spline_work(pme->spline_work);
if (pme->gpu != nullptr)
delete pme;
}
-void gmx_pme_reinit_atoms(gmx_pme_t* pme, const int numAtoms, const real* chargesA, const real* chargesB)
+void gmx_pme_reinit_atoms(gmx_pme_t* pme,
+ const int numAtoms,
+ gmx::ArrayRef<const real> chargesA,
+ gmx::ArrayRef<const real> chargesB)
{
if (pme->gpu != nullptr)
{
- GMX_ASSERT(!(pme->bFEP_q && chargesB == nullptr),
+ GMX_ASSERT(!(pme->bFEP_q && chargesB.empty()),
"B state charges must be specified if running Coulomb FEP on the GPU");
- pme_gpu_reinit_atoms(pme->gpu, numAtoms, chargesA, pme->bFEP_q ? chargesB : nullptr);
+ pme_gpu_reinit_atoms(pme->gpu, numAtoms, chargesA.data(), pme->bFEP_q ? chargesB.data() : nullptr);
}
else
{
{
return (pme.nkx == grid_size[XX] && pme.nky == grid_size[YY] && pme.nkz == grid_size[ZZ]);
}
+
+void gmx::SeparatePmeRanksPermitted::disablePmeRanks(const std::string& reason)
+{
+ permitSeparatePmeRanks_ = false;
+
+ if (!reason.empty())
+ {
+ reasons_.push_back(reason);
+ }
+}
+
+bool gmx::SeparatePmeRanksPermitted::permitSeparatePmeRanks() const
+{
+ return permitSeparatePmeRanks_;
+}
+
+std::string gmx::SeparatePmeRanksPermitted::reasonsWhyDisabled() const
+{
+ return joinStrings(reasons_, "; ");
+}
#define GMX_EWALD_PME_H
#include <string>
+#include <vector>
#include "gromacs/gpu_utils/devicebuffer_datatype.h"
#include "gromacs/gpu_utils/gpu_macros.h"
#include "gromacs/math/vectypes.h"
-#include "gromacs/utility/basedefinitions.h"
#include "gromacs/utility/real.h"
struct gmx_hw_info_t;
class PmeGpuProgram;
class GpuEventSynchronizer;
+/*! \brief Hack to selectively enable some parts of PME during unit testing.
+ *
+ * Set to \c false by default. If any of the tests sets it to \c true, it will
+ * make the compatibility check consider PME to be supported in SYCL builds.
+ *
+ * Currently we don't have proper PME implementation with SYCL, but we still want
+ * to run tests for some of the kernels.
+ *
+ * \todo Remove after #3927 is done and PME is fully enabled in SYCL builds.
+ */
+//NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
+extern bool g_allowPmeWithSyclForTesting;
+
namespace gmx
{
template<typename>
class MDLogger;
enum class PinningPolicy : int;
class StepWorkload;
+
+/*! \libinternal \brief Class for managing usage of separate PME-only ranks
+ *
+ * Used for checking if some parts of the code could not use PME-only ranks
+ *
+ */
+class SeparatePmeRanksPermitted
+{
+public:
+ //! Disables PME ranks permitted flag with a reason
+ void disablePmeRanks(const std::string& reason);
+ //! Return status of PME ranks usage
+ bool permitSeparatePmeRanks() const;
+ //! Returns all reasons, for not using PME ranks
+ std::string reasonsWhyDisabled() const;
+
+private:
+ //! Flag that informs whether simualtion could use dedicated PME ranks
+ bool permitSeparatePmeRanks_ = true;
+ //! Storage for all reasons, why PME ranks could not be used
+ std::vector<std::string> reasons_;
+};
+
+class PmeCoordinateReceiverGpu;
} // namespace gmx
enum
int gmx_pme_do(struct gmx_pme_t* pme,
gmx::ArrayRef<const gmx::RVec> coordinates,
gmx::ArrayRef<gmx::RVec> forces,
- real chargeA[],
- real chargeB[],
- real c6A[],
- real c6B[],
- real sigmaA[],
- real sigmaB[],
+ gmx::ArrayRef<const real> chargeA,
+ gmx::ArrayRef<const real> chargeB,
+ gmx::ArrayRef<const real> c6A,
+ gmx::ArrayRef<const real> c6B,
+ gmx::ArrayRef<const real> sigmaA,
+ gmx::ArrayRef<const real> sigmaB,
const matrix box,
const t_commrec* cr,
int maxshift_x,
* pme struct. Currently does not work in parallel or with free
* energy.
*/
-void gmx_pme_calc_energy(gmx_pme_t* pme, gmx::ArrayRef<const gmx::RVec> x, gmx::ArrayRef<const real> q, real* V);
+real gmx_pme_calc_energy(gmx_pme_t* pme, gmx::ArrayRef<const gmx::RVec> x, gmx::ArrayRef<const real> q);
/*! \brief
* This function updates the local atom data on GPU after DD (charges, coordinates, etc.).
* \param[in] chargesB The pointer to the array of particle charges in state B. Only used if
* charges are perturbed and can otherwise be nullptr.
*/
-void gmx_pme_reinit_atoms(gmx_pme_t* pme, int numAtoms, const real* chargesA, const real* chargesB);
+void gmx_pme_reinit_atoms(gmx_pme_t* pme,
+ int numAtoms,
+ gmx::ArrayRef<const real> chargesA,
+ gmx::ArrayRef<const real> chargesB);
/* A block of PME GPU functions */
*/
bool pme_gpu_supports_input(const t_inputrec& ir, std::string* error);
+ /*! \brief Checks whether the input system allows to run PME on GPU in Mixed mode.
+ * Assumes that the input system is compatible with GPU PME otherwise, that is,
+ * before calling this function one should check that \ref pme_gpu_supports_input returns \c true.
+ *
+ * \param[in] ir Input system.
+ * \param[out] error If non-null, the error message if the input is not supported.
+ *
+ * \returns true if PME can run on GPU in Mixed mode with this input, false otherwise.
+ */
+ bool pme_gpu_mixed_mode_supports_input(const t_inputrec& ir, std::string* error);
+
/*! \brief
* Returns the active PME codepath (CPU, GPU, mixed).
* \todo This is a rather static data that should be managed by the higher level task scheduler.
/*! \brief
* Launches first stage of PME on GPU - spreading kernel.
*
- * \param[in] pme The PME data structure.
- * \param[in] xReadyOnDevice Event synchronizer indicating that the coordinates
- * are ready in the device memory; nullptr allowed only on separate PME ranks.
- * \param[in] wcycle The wallclock counter.
- * \param[in] lambdaQ The Coulomb lambda of the current state of the
- * system. Only used if FEP of Coulomb is active.
+ * \param[in] pme The PME data structure.
+ * \param[in] xReadyOnDevice Event synchronizer indicating that the coordinates
+ * are ready in the device memory; nullptr allowed only
+ * on separate PME ranks.
+ * \param[in] wcycle The wallclock counter.
+ * \param[in] lambdaQ The Coulomb lambda of the current state of the
+ * system. Only used if FEP of Coulomb is active.
+ * \param[in] useGpuDirectComm Whether direct GPU PME-PP communication is active
+ * \param[in] pmeCoordinateReceiverGpu Coordinate receiver object, which must be valid when
+ * direct GPU PME-PP communication is active
*/
-GPU_FUNC_QUALIFIER void pme_gpu_launch_spread(gmx_pme_t* GPU_FUNC_ARGUMENT(pme),
- GpuEventSynchronizer* GPU_FUNC_ARGUMENT(xReadyOnDevice),
- gmx_wallcycle* GPU_FUNC_ARGUMENT(wcycle),
- real GPU_FUNC_ARGUMENT(lambdaQ)) GPU_FUNC_TERM;
+GPU_FUNC_QUALIFIER void pme_gpu_launch_spread(
+ gmx_pme_t* GPU_FUNC_ARGUMENT(pme),
+ GpuEventSynchronizer* GPU_FUNC_ARGUMENT(xReadyOnDevice),
+ gmx_wallcycle* GPU_FUNC_ARGUMENT(wcycle),
+ real GPU_FUNC_ARGUMENT(lambdaQ),
+ bool GPU_FUNC_ARGUMENT(useGpuDirectComm),
+ gmx::PmeCoordinateReceiverGpu* GPU_FUNC_ARGUMENT(pmeCoordinateReceiverGpu)) GPU_FUNC_TERM;
/*! \brief
* Launches middle stages of PME (FFT R2C, solving, FFT C2R) either on GPU or on CPU, depending on the run mode.
* \param[in] pme The PME data structure.
* \returns Pointer to force data
*/
-GPU_FUNC_QUALIFIER void* pme_gpu_get_device_f(const gmx_pme_t* GPU_FUNC_ARGUMENT(pme))
- GPU_FUNC_TERM_WITH_RETURN(nullptr);
+GPU_FUNC_QUALIFIER DeviceBuffer<gmx::RVec> pme_gpu_get_device_f(const gmx_pme_t* GPU_FUNC_ARGUMENT(pme))
+ GPU_FUNC_TERM_WITH_RETURN(DeviceBuffer<gmx::RVec>{});
/*! \brief Get pointer to the device synchronizer object that allows syncing on PME force calculation completion
* \param[in] pme The PME data structure.
- * \returns Pointer to sychronizer
+ * \returns Pointer to synchronizer
*/
GPU_FUNC_QUALIFIER GpuEventSynchronizer* pme_gpu_get_f_ready_synchronizer(const gmx_pme_t* GPU_FUNC_ARGUMENT(pme))
GPU_FUNC_TERM_WITH_RETURN(nullptr);
#include "gromacs/math/vec.h"
#include "gromacs/mdlib/gmx_omp_nthreads.h"
#include "gromacs/mdtypes/commrec.h"
-#include "gromacs/mdtypes/mdatom.h"
#include "gromacs/pbcutil/ishift.h"
#include "gromacs/pbcutil/pbc.h"
#include "gromacs/timing/wallcycle.h"
//! The interaction lists, only vsite entries are used
std::array<InteractionList, F_NRE> ilist;
//! Local fshift accumulation buffer
- std::array<RVec, SHIFTS> fshift;
+ std::array<RVec, c_numShiftVectors> fshift;
//! Local virial dx*df accumulation buffer
matrix dxdf;
//! Tells if interdependent task idTask should be used (in addition to the rest of this task), this bool has the same value on all threads
//! Set VSites and distribute VSite work over threads, should be called after DD partitioning
void setVirtualSites(ArrayRef<const InteractionList> ilist,
ArrayRef<const t_iparams> iparams,
- const t_mdatoms& mdatoms,
+ int numAtoms,
+ int homenr,
+ ArrayRef<const ParticleType> ptype,
bool useDomdec);
private:
{
public:
//! Constructor, domdec should be nullptr without DD
- Impl(const gmx_mtop_t& mtop, gmx_domdec_t* domdec, PbcType pbcType);
+ Impl(const gmx_mtop_t& mtop,
+ gmx_domdec_t* domdec,
+ PbcType pbcType,
+ ArrayRef<const RangePartitioning> updateGroupingPerMoleculeType);
//! Returns the number of virtual sites acting over multiple update groups
int numInterUpdategroupVirtualSites() const { return numInterUpdategroupVirtualSites_; }
//! Set VSites and distribute VSite work over threads, should be called after DD partitioning
- void setVirtualSites(ArrayRef<const InteractionList> ilist, const t_mdatoms& mdatoms);
+ void setVirtualSites(ArrayRef<const InteractionList> ilist,
+ int numAtoms,
+ int homenr,
+ ArrayRef<const ParticleType> ptype);
/*! \brief Create positions of vsite atoms based for the local system
*
- * \param[in,out] x The coordinates
- * \param[in] dt The time step
- * \param[in,out] v When != nullptr, velocities for vsites are set as displacement/dt
- * \param[in] box The box
+ * \param[in,out] x The coordinates
+ * \param[in,out] v The velocities, needed if operation requires it
+ * \param[in] box The box
+ * \param[in] operation Whether we calculate positions, velocities, or both
*/
- void construct(ArrayRef<RVec> x, real dt, ArrayRef<RVec> v, const matrix box) const;
+ void construct(ArrayRef<RVec> x, ArrayRef<RVec> v, const matrix box, VSiteOperation operation) const;
/*! \brief Spread the force operating on the vsite atoms on the surrounding atoms.
*
else
{
rvec_sub(xi, xj, dx);
- return CENTRAL;
+ return c_centralShiftIndex;
}
}
return gmx::invsqrt(iprod(x, x));
}
+//! Whether we're calculating the virtual site position
+enum class VSiteCalculatePosition
+{
+ Yes,
+ No
+};
+//! Whether we're calculating the virtual site velocity
+enum class VSiteCalculateVelocity
+{
+ Yes,
+ No
+};
+
#ifndef DOXYGEN
/* Vsite construction routines */
-static void constr_vsite1(const rvec xi, rvec x)
-{
- copy_rvec(xi, x);
+// GCC 8 falsely flags unused variables if constexpr prunes a code path, fixed in GCC 9
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85827
+// clang-format off
+GCC_DIAGNOSTIC_IGNORE(-Wunused-but-set-parameter)
+// clang-format on
- /* TOTAL: 0 flops */
+template<VSiteCalculatePosition calculatePosition, VSiteCalculateVelocity calculateVelocity>
+static void constr_vsite1(const rvec xi, rvec x, const rvec vi, rvec v)
+{
+ if (calculatePosition == VSiteCalculatePosition::Yes)
+ {
+ copy_rvec(xi, x);
+ /* TOTAL: 0 flops */
+ }
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
+ {
+ copy_rvec(vi, v);
+ }
}
-static void constr_vsite2(const rvec xi, const rvec xj, rvec x, real a, const t_pbc* pbc)
+template<VSiteCalculatePosition calculatePosition, VSiteCalculateVelocity calculateVelocity>
+static void
+constr_vsite2(const rvec xi, const rvec xj, rvec x, real a, const t_pbc* pbc, const rvec vi, const rvec vj, rvec v)
{
- real b = 1 - a;
+ const real b = 1 - a;
/* 1 flop */
- if (pbc)
+ if (calculatePosition == VSiteCalculatePosition::Yes)
{
- rvec dx;
- pbc_dx_aiuc(pbc, xj, xi, dx);
- x[XX] = xi[XX] + a * dx[XX];
- x[YY] = xi[YY] + a * dx[YY];
- x[ZZ] = xi[ZZ] + a * dx[ZZ];
+ if (pbc)
+ {
+ rvec dx;
+ pbc_dx_aiuc(pbc, xj, xi, dx);
+ x[XX] = xi[XX] + a * dx[XX];
+ x[YY] = xi[YY] + a * dx[YY];
+ x[ZZ] = xi[ZZ] + a * dx[ZZ];
+ }
+ else
+ {
+ x[XX] = b * xi[XX] + a * xj[XX];
+ x[YY] = b * xi[YY] + a * xj[YY];
+ x[ZZ] = b * xi[ZZ] + a * xj[ZZ];
+ /* 9 Flops */
+ }
+ /* TOTAL: 10 flops */
}
- else
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
{
- x[XX] = b * xi[XX] + a * xj[XX];
- x[YY] = b * xi[YY] + a * xj[YY];
- x[ZZ] = b * xi[ZZ] + a * xj[ZZ];
- /* 9 Flops */
+ v[XX] = b * vi[XX] + a * vj[XX];
+ v[YY] = b * vi[YY] + a * vj[YY];
+ v[ZZ] = b * vi[ZZ] + a * vj[ZZ];
}
-
- /* TOTAL: 10 flops */
}
-static void constr_vsite2FD(const rvec xi, const rvec xj, rvec x, real a, const t_pbc* pbc)
+template<VSiteCalculatePosition calculatePosition, VSiteCalculateVelocity calculateVelocity>
+static void
+constr_vsite2FD(const rvec xi, const rvec xj, rvec x, real a, const t_pbc* pbc, const rvec vi, const rvec vj, rvec v)
{
- rvec xij;
+ rvec xij = { 0 };
pbc_rvec_sub(pbc, xj, xi, xij);
/* 3 flops */
- const real b = a * inverseNorm(xij);
+ const real invNormXij = inverseNorm(xij);
+ const real b = a * invNormXij;
/* 6 + 10 flops */
- x[XX] = xi[XX] + b * xij[XX];
- x[YY] = xi[YY] + b * xij[YY];
- x[ZZ] = xi[ZZ] + b * xij[ZZ];
- /* 6 Flops */
+ if (calculatePosition == VSiteCalculatePosition::Yes)
+ {
+ x[XX] = xi[XX] + b * xij[XX];
+ x[YY] = xi[YY] + b * xij[YY];
+ x[ZZ] = xi[ZZ] + b * xij[ZZ];
+ /* 6 Flops */
+ /* TOTAL: 25 flops */
+ }
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
+ {
+ rvec vij = { 0 };
+ rvec_sub(vj, vi, vij);
+ const real vijDotXij = iprod(vij, xij);
- /* TOTAL: 25 flops */
+ v[XX] = vi[XX] + b * (vij[XX] - xij[XX] * vijDotXij * invNormXij * invNormXij);
+ v[YY] = vi[YY] + b * (vij[YY] - xij[YY] * vijDotXij * invNormXij * invNormXij);
+ v[ZZ] = vi[ZZ] + b * (vij[ZZ] - xij[ZZ] * vijDotXij * invNormXij * invNormXij);
+ }
}
-static void constr_vsite3(const rvec xi, const rvec xj, const rvec xk, rvec x, real a, real b, const t_pbc* pbc)
+template<VSiteCalculatePosition calculatePosition, VSiteCalculateVelocity calculateVelocity>
+static void constr_vsite3(const rvec xi,
+ const rvec xj,
+ const rvec xk,
+ rvec x,
+ real a,
+ real b,
+ const t_pbc* pbc,
+ const rvec vi,
+ const rvec vj,
+ const rvec vk,
+ rvec v)
{
- real c = 1 - a - b;
+ const real c = 1 - a - b;
/* 2 flops */
- if (pbc)
+ if (calculatePosition == VSiteCalculatePosition::Yes)
{
- rvec dxj, dxk;
+ if (pbc)
+ {
+ rvec dxj, dxk;
- pbc_dx_aiuc(pbc, xj, xi, dxj);
- pbc_dx_aiuc(pbc, xk, xi, dxk);
- x[XX] = xi[XX] + a * dxj[XX] + b * dxk[XX];
- x[YY] = xi[YY] + a * dxj[YY] + b * dxk[YY];
- x[ZZ] = xi[ZZ] + a * dxj[ZZ] + b * dxk[ZZ];
+ pbc_dx_aiuc(pbc, xj, xi, dxj);
+ pbc_dx_aiuc(pbc, xk, xi, dxk);
+ x[XX] = xi[XX] + a * dxj[XX] + b * dxk[XX];
+ x[YY] = xi[YY] + a * dxj[YY] + b * dxk[YY];
+ x[ZZ] = xi[ZZ] + a * dxj[ZZ] + b * dxk[ZZ];
+ }
+ else
+ {
+ x[XX] = c * xi[XX] + a * xj[XX] + b * xk[XX];
+ x[YY] = c * xi[YY] + a * xj[YY] + b * xk[YY];
+ x[ZZ] = c * xi[ZZ] + a * xj[ZZ] + b * xk[ZZ];
+ /* 15 Flops */
+ }
+ /* TOTAL: 17 flops */
}
- else
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
{
- x[XX] = c * xi[XX] + a * xj[XX] + b * xk[XX];
- x[YY] = c * xi[YY] + a * xj[YY] + b * xk[YY];
- x[ZZ] = c * xi[ZZ] + a * xj[ZZ] + b * xk[ZZ];
- /* 15 Flops */
+ v[XX] = c * vi[XX] + a * vj[XX] + b * vk[XX];
+ v[YY] = c * vi[YY] + a * vj[YY] + b * vk[YY];
+ v[ZZ] = c * vi[ZZ] + a * vj[ZZ] + b * vk[ZZ];
}
-
- /* TOTAL: 17 flops */
}
-static void constr_vsite3FD(const rvec xi, const rvec xj, const rvec xk, rvec x, real a, real b, const t_pbc* pbc)
+template<VSiteCalculatePosition calculatePosition, VSiteCalculateVelocity calculateVelocity>
+static void constr_vsite3FD(const rvec xi,
+ const rvec xj,
+ const rvec xk,
+ rvec x,
+ real a,
+ real b,
+ const t_pbc* pbc,
+ const rvec vi,
+ const rvec vj,
+ const rvec vk,
+ rvec v)
{
rvec xij, xjk, temp;
- real c;
pbc_rvec_sub(pbc, xj, xi, xij);
pbc_rvec_sub(pbc, xk, xj, xjk);
temp[ZZ] = xij[ZZ] + a * xjk[ZZ];
/* 6 flops */
- c = b * inverseNorm(temp);
+ const real invNormTemp = inverseNorm(temp);
+ const real c = b * invNormTemp;
/* 6 + 10 flops */
- x[XX] = xi[XX] + c * temp[XX];
- x[YY] = xi[YY] + c * temp[YY];
- x[ZZ] = xi[ZZ] + c * temp[ZZ];
- /* 6 Flops */
+ if (calculatePosition == VSiteCalculatePosition::Yes)
+ {
+ x[XX] = xi[XX] + c * temp[XX];
+ x[YY] = xi[YY] + c * temp[YY];
+ x[ZZ] = xi[ZZ] + c * temp[ZZ];
+ /* 6 Flops */
+ /* TOTAL: 34 flops */
+ }
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
+ {
+ rvec vij = { 0 };
+ rvec vjk = { 0 };
+ rvec_sub(vj, vi, vij);
+ rvec_sub(vk, vj, vjk);
+ const rvec tempV = { vij[XX] + a * vjk[XX], vij[YY] + a * vjk[YY], vij[ZZ] + a * vjk[ZZ] };
+ const real tempDotTempV = iprod(temp, tempV);
- /* TOTAL: 34 flops */
+ v[XX] = vi[XX] + c * (tempV[XX] - temp[XX] * tempDotTempV * invNormTemp * invNormTemp);
+ v[YY] = vi[YY] + c * (tempV[YY] - temp[YY] * tempDotTempV * invNormTemp * invNormTemp);
+ v[ZZ] = vi[ZZ] + c * (tempV[ZZ] - temp[ZZ] * tempDotTempV * invNormTemp * invNormTemp);
+ }
}
-static void constr_vsite3FAD(const rvec xi, const rvec xj, const rvec xk, rvec x, real a, real b, const t_pbc* pbc)
-{
+template<VSiteCalculatePosition calculatePosition, VSiteCalculateVelocity calculateVelocity>
+static void constr_vsite3FAD(const rvec xi,
+ const rvec xj,
+ const rvec xk,
+ rvec x,
+ real a,
+ real b,
+ const t_pbc* pbc,
+ const rvec vi,
+ const rvec vj,
+ const rvec vk,
+ rvec v)
+{ // Note: a = d * cos(theta)
+ // b = d * sin(theta)
rvec xij, xjk, xp;
- real a1, b1, c1, invdij;
pbc_rvec_sub(pbc, xj, xi, xij);
pbc_rvec_sub(pbc, xk, xj, xjk);
/* 6 flops */
- invdij = inverseNorm(xij);
- c1 = invdij * invdij * iprod(xij, xjk);
- xp[XX] = xjk[XX] - c1 * xij[XX];
- xp[YY] = xjk[YY] - c1 * xij[YY];
- xp[ZZ] = xjk[ZZ] - c1 * xij[ZZ];
- a1 = a * invdij;
- b1 = b * inverseNorm(xp);
+ const real invdij = inverseNorm(xij);
+ const real xijDotXjk = iprod(xij, xjk);
+ const real c1 = invdij * invdij * xijDotXjk;
+ xp[XX] = xjk[XX] - c1 * xij[XX];
+ xp[YY] = xjk[YY] - c1 * xij[YY];
+ xp[ZZ] = xjk[ZZ] - c1 * xij[ZZ];
+ const real a1 = a * invdij;
+ const real invNormXp = inverseNorm(xp);
+ const real b1 = b * invNormXp;
/* 45 */
- x[XX] = xi[XX] + a1 * xij[XX] + b1 * xp[XX];
- x[YY] = xi[YY] + a1 * xij[YY] + b1 * xp[YY];
- x[ZZ] = xi[ZZ] + a1 * xij[ZZ] + b1 * xp[ZZ];
- /* 12 Flops */
-
- /* TOTAL: 63 flops */
+ if (calculatePosition == VSiteCalculatePosition::Yes)
+ {
+ x[XX] = xi[XX] + a1 * xij[XX] + b1 * xp[XX];
+ x[YY] = xi[YY] + a1 * xij[YY] + b1 * xp[YY];
+ x[ZZ] = xi[ZZ] + a1 * xij[ZZ] + b1 * xp[ZZ];
+ /* 12 Flops */
+ /* TOTAL: 63 flops */
+ }
+
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
+ {
+ rvec vij = { 0 };
+ rvec vjk = { 0 };
+ rvec_sub(vj, vi, vij);
+ rvec_sub(vk, vj, vjk);
+
+ const real vijDotXjkPlusXijDotVjk = iprod(vij, xjk) + iprod(xij, vjk);
+ const real xijDotVij = iprod(xij, vij);
+ const real invNormXij2 = invdij * invdij;
+
+ rvec vp = { 0 };
+ vp[XX] = vjk[XX]
+ - xij[XX] * invNormXij2
+ * (vijDotXjkPlusXijDotVjk - invNormXij2 * xijDotXjk * xijDotVij * 2)
+ - vij[XX] * xijDotXjk * invNormXij2;
+ vp[YY] = vjk[YY]
+ - xij[YY] * invNormXij2
+ * (vijDotXjkPlusXijDotVjk - invNormXij2 * xijDotXjk * xijDotVij * 2)
+ - vij[YY] * xijDotXjk * invNormXij2;
+ vp[ZZ] = vjk[ZZ]
+ - xij[ZZ] * invNormXij2
+ * (vijDotXjkPlusXijDotVjk - invNormXij2 * xijDotXjk * xijDotVij * 2)
+ - vij[ZZ] * xijDotXjk * invNormXij2;
+
+ const real xpDotVp = iprod(xp, vp);
+
+ v[XX] = vi[XX] + a1 * (vij[XX] - xij[XX] * xijDotVij * invdij * invdij)
+ + b1 * (vp[XX] - xp[XX] * xpDotVp * invNormXp * invNormXp);
+ v[YY] = vi[YY] + a1 * (vij[YY] - xij[YY] * xijDotVij * invdij * invdij)
+ + b1 * (vp[YY] - xp[YY] * xpDotVp * invNormXp * invNormXp);
+ v[ZZ] = vi[ZZ] + a1 * (vij[ZZ] - xij[ZZ] * xijDotVij * invdij * invdij)
+ + b1 * (vp[ZZ] - xp[ZZ] * xpDotVp * invNormXp * invNormXp);
+ }
}
-static void
-constr_vsite3OUT(const rvec xi, const rvec xj, const rvec xk, rvec x, real a, real b, real c, const t_pbc* pbc)
+template<VSiteCalculatePosition calculatePosition, VSiteCalculateVelocity calculateVelocity>
+static void constr_vsite3OUT(const rvec xi,
+ const rvec xj,
+ const rvec xk,
+ rvec x,
+ real a,
+ real b,
+ real c,
+ const t_pbc* pbc,
+ const rvec vi,
+ const rvec vj,
+ const rvec vk,
+ rvec v)
{
rvec xij, xik, temp;
cprod(xij, xik, temp);
/* 15 Flops */
- x[XX] = xi[XX] + a * xij[XX] + b * xik[XX] + c * temp[XX];
- x[YY] = xi[YY] + a * xij[YY] + b * xik[YY] + c * temp[YY];
- x[ZZ] = xi[ZZ] + a * xij[ZZ] + b * xik[ZZ] + c * temp[ZZ];
- /* 18 Flops */
+ if (calculatePosition == VSiteCalculatePosition::Yes)
+ {
+ x[XX] = xi[XX] + a * xij[XX] + b * xik[XX] + c * temp[XX];
+ x[YY] = xi[YY] + a * xij[YY] + b * xik[YY] + c * temp[YY];
+ x[ZZ] = xi[ZZ] + a * xij[ZZ] + b * xik[ZZ] + c * temp[ZZ];
+ /* 18 Flops */
+ /* TOTAL: 33 flops */
+ }
+
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
+ {
+ rvec vij = { 0 };
+ rvec vik = { 0 };
+ rvec_sub(vj, vi, vij);
+ rvec_sub(vk, vi, vik);
+
+ rvec temp1 = { 0 };
+ rvec temp2 = { 0 };
+ cprod(vij, xik, temp1);
+ cprod(xij, vik, temp2);
- /* TOTAL: 33 flops */
+ v[XX] = vi[XX] + a * vij[XX] + b * vik[XX] + c * (temp1[XX] + temp2[XX]);
+ v[YY] = vi[YY] + a * vij[YY] + b * vik[YY] + c * (temp1[YY] + temp2[YY]);
+ v[ZZ] = vi[ZZ] + a * vij[ZZ] + b * vik[ZZ] + c * (temp1[ZZ] + temp2[ZZ]);
+ }
}
+template<VSiteCalculatePosition calculatePosition, VSiteCalculateVelocity calculateVelocity>
static void constr_vsite4FD(const rvec xi,
const rvec xj,
const rvec xk,
real a,
real b,
real c,
- const t_pbc* pbc)
+ const t_pbc* pbc,
+ const rvec vi,
+ const rvec vj,
+ const rvec vk,
+ const rvec vl,
+ rvec v)
{
rvec xij, xjk, xjl, temp;
real d;
temp[ZZ] = xij[ZZ] + a * xjk[ZZ] + b * xjl[ZZ];
/* 12 flops */
- d = c * inverseNorm(temp);
+ const real invRm = inverseNorm(temp);
+ d = c * invRm;
/* 6 + 10 flops */
- x[XX] = xi[XX] + d * temp[XX];
- x[YY] = xi[YY] + d * temp[YY];
- x[ZZ] = xi[ZZ] + d * temp[ZZ];
- /* 6 Flops */
+ if (calculatePosition == VSiteCalculatePosition::Yes)
+ {
+ x[XX] = xi[XX] + d * temp[XX];
+ x[YY] = xi[YY] + d * temp[YY];
+ x[ZZ] = xi[ZZ] + d * temp[ZZ];
+ /* 6 Flops */
+ /* TOTAL: 43 flops */
+ }
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
+ {
+ rvec vij = { 0 };
+ rvec vjk = { 0 };
+ rvec vjl = { 0 };
+
+ rvec_sub(vj, vi, vij);
+ rvec_sub(vk, vj, vjk);
+ rvec_sub(vl, vj, vjl);
- /* TOTAL: 43 flops */
+ rvec vm = { 0 };
+ vm[XX] = vij[XX] + a * vjk[XX] + b * vjl[XX];
+ vm[YY] = vij[YY] + a * vjk[YY] + b * vjl[YY];
+ vm[ZZ] = vij[ZZ] + a * vjk[ZZ] + b * vjl[ZZ];
+
+ const real vmDotRm = iprod(vm, temp);
+ v[XX] = vi[XX] + d * (vm[XX] - temp[XX] * vmDotRm * invRm * invRm);
+ v[YY] = vi[YY] + d * (vm[YY] - temp[YY] * vmDotRm * invRm * invRm);
+ v[ZZ] = vi[ZZ] + d * (vm[ZZ] - temp[ZZ] * vmDotRm * invRm * invRm);
+ }
}
+template<VSiteCalculatePosition calculatePosition, VSiteCalculateVelocity calculateVelocity>
static void constr_vsite4FDN(const rvec xi,
const rvec xj,
const rvec xk,
real a,
real b,
real c,
- const t_pbc* pbc)
+ const t_pbc* pbc,
+ const rvec vi,
+ const rvec vj,
+ const rvec vk,
+ const rvec vl,
+ rvec v)
{
rvec xij, xik, xil, ra, rb, rja, rjb, rm;
real d;
cprod(rja, rjb, rm);
/* 9 flops */
- d = c * inverseNorm(rm);
+ const real invNormRm = inverseNorm(rm);
+ d = c * invNormRm;
/* 5+5+1 flops */
- x[XX] = xi[XX] + d * rm[XX];
- x[YY] = xi[YY] + d * rm[YY];
- x[ZZ] = xi[ZZ] + d * rm[ZZ];
- /* 6 Flops */
+ if (calculatePosition == VSiteCalculatePosition::Yes)
+ {
+ x[XX] = xi[XX] + d * rm[XX];
+ x[YY] = xi[YY] + d * rm[YY];
+ x[ZZ] = xi[ZZ] + d * rm[ZZ];
+ /* 6 Flops */
+ /* TOTAL: 47 flops */
+ }
+
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
+ {
+ rvec vij = { 0 };
+ rvec vik = { 0 };
+ rvec vil = { 0 };
+ rvec_sub(vj, vi, vij);
+ rvec_sub(vk, vi, vik);
+ rvec_sub(vl, vi, vil);
+
+ rvec vja = { 0 };
+ rvec vjb = { 0 };
+
+ vja[XX] = a * vik[XX] - vij[XX];
+ vja[YY] = a * vik[YY] - vij[YY];
+ vja[ZZ] = a * vik[ZZ] - vij[ZZ];
+ vjb[XX] = b * vil[XX] - vij[XX];
+ vjb[YY] = b * vil[YY] - vij[YY];
+ vjb[ZZ] = b * vil[ZZ] - vij[ZZ];
+
+ rvec temp1 = { 0 };
+ rvec temp2 = { 0 };
+ cprod(vja, rjb, temp1);
+ cprod(rja, vjb, temp2);
- /* TOTAL: 47 flops */
+ rvec vm = { 0 };
+ vm[XX] = temp1[XX] + temp2[XX];
+ vm[YY] = temp1[YY] + temp2[YY];
+ vm[ZZ] = temp1[ZZ] + temp2[ZZ];
+
+ const real rmDotVm = iprod(rm, vm);
+ v[XX] = vi[XX] + d * (vm[XX] - rm[XX] * rmDotVm * invNormRm * invNormRm);
+ v[YY] = vi[YY] + d * (vm[YY] - rm[YY] * rmDotVm * invNormRm * invNormRm);
+ v[ZZ] = vi[ZZ] + d * (vm[ZZ] - rm[ZZ] * rmDotVm * invNormRm * invNormRm);
+ }
}
-static int constr_vsiten(const t_iatom* ia, ArrayRef<const t_iparams> ip, ArrayRef<RVec> x, const t_pbc* pbc)
+template<VSiteCalculatePosition calculatePosition, VSiteCalculateVelocity calculateVelocity>
+static int constr_vsiten(const t_iatom* ia,
+ ArrayRef<const t_iparams> ip,
+ ArrayRef<RVec> x,
+ const t_pbc* pbc,
+ ArrayRef<RVec> v)
{
rvec x1, dx;
dvec dsum;
- int n3, av, ai;
real a;
+ dvec dvsum = { 0 };
+ rvec v1 = { 0 };
- n3 = 3 * ip[ia[0]].vsiten.n;
- av = ia[1];
- ai = ia[2];
+ const int n3 = 3 * ip[ia[0]].vsiten.n;
+ const int av = ia[1];
+ int ai = ia[2];
copy_rvec(x[ai], x1);
+ copy_rvec(v[ai], v1);
clear_dvec(dsum);
for (int i = 3; i < n3; i += 3)
{
ai = ia[i + 2];
a = ip[ia[i]].vsiten.a;
- if (pbc)
+ if (calculatePosition == VSiteCalculatePosition::Yes)
{
- pbc_dx_aiuc(pbc, x[ai], x1, dx);
+ if (pbc)
+ {
+ pbc_dx_aiuc(pbc, x[ai], x1, dx);
+ }
+ else
+ {
+ rvec_sub(x[ai], x1, dx);
+ }
+ dsum[XX] += a * dx[XX];
+ dsum[YY] += a * dx[YY];
+ dsum[ZZ] += a * dx[ZZ];
+ /* 9 Flops */
}
- else
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
{
- rvec_sub(x[ai], x1, dx);
+ rvec_sub(v[ai], v1, dx);
+ dvsum[XX] += a * dx[XX];
+ dvsum[YY] += a * dx[YY];
+ dvsum[ZZ] += a * dx[ZZ];
+ /* 9 Flops */
}
- dsum[XX] += a * dx[XX];
- dsum[YY] += a * dx[YY];
- dsum[ZZ] += a * dx[ZZ];
- /* 9 Flops */
}
- x[av][XX] = x1[XX] + dsum[XX];
- x[av][YY] = x1[YY] + dsum[YY];
- x[av][ZZ] = x1[ZZ] + dsum[ZZ];
+ if (calculatePosition == VSiteCalculatePosition::Yes)
+ {
+ x[av][XX] = x1[XX] + dsum[XX];
+ x[av][YY] = x1[YY] + dsum[YY];
+ x[av][ZZ] = x1[ZZ] + dsum[ZZ];
+ }
+
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
+ {
+ v[av][XX] = v1[XX] + dvsum[XX];
+ v[av][YY] = v1[YY] + dvsum[YY];
+ v[av][ZZ] = v1[ZZ] + dvsum[ZZ];
+ }
return n3;
}
+// End GCC 8 bug
+GCC_DIAGNOSTIC_RESET
#endif // DOXYGEN
/*! \brief Executes the vsite construction task for a single thread
*
+ * \tparam operation Whether we are calculating positions, velocities, or both
* \param[in,out] x Coordinates to construct vsites for
- * \param[in] dt Time step, needed when v is not empty
- * \param[in,out] v When not empty, velocities are generated for virtual sites
+ * \param[in,out] v Velocities are generated for virtual sites if `operation` requires it
* \param[in] ip Interaction parameters for all interaction, only vsite parameters are used
* \param[in] ilist The interaction lists, only vsites are usesd
* \param[in] pbc_null PBC struct, used for PBC distance calculations when !=nullptr
*/
+template<VSiteCalculatePosition calculatePosition, VSiteCalculateVelocity calculateVelocity>
static void construct_vsites_thread(ArrayRef<RVec> x,
- const real dt,
ArrayRef<RVec> v,
ArrayRef<const t_iparams> ip,
ArrayRef<const InteractionList> ilist,
const t_pbc* pbc_null)
{
- real inv_dt;
- if (!v.empty())
- {
- inv_dt = 1.0 / dt;
- }
- else
- {
- inv_dt = 1.0;
- }
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
+ {
+ GMX_RELEASE_ASSERT(!v.empty(),
+ "Can't calculate velocities without access to velocity vector.");
+ }
+
+ // Work around clang bug (unfixed as of Feb 2021)
+ // https://bugs.llvm.org/show_bug.cgi?id=35450
+ // clang-format off
+ CLANG_DIAGNOSTIC_IGNORE(-Wunused-lambda-capture)
+ // clang-format on
+ // GCC 8 falsely flags unused variables if constexpr prunes a code path, fixed in GCC 9
+ // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85827
+ // clang-format off
+ GCC_DIAGNOSTIC_IGNORE(-Wunused-but-set-parameter)
+ // clang-format on
+ // getVOrNull returns a velocity rvec if we need it, nullptr otherwise.
+ auto getVOrNull = [v](int idx) -> real* {
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
+ {
+ return v[idx].as_vec();
+ }
+ else
+ {
+ return nullptr;
+ }
+ };
+ GCC_DIAGNOSTIC_RESET
+ CLANG_DIAGNOSTIC_RESET
const PbcMode pbcMode = getPbcMode(pbc_null);
/* We need another pbc pointer, as with charge groups we switch per vsite */
real b1, c1;
switch (ftype)
{
- case F_VSITE1: constr_vsite1(x[ai], x[avsite]); break;
+ case F_VSITE1:
+ constr_vsite1<calculatePosition, calculateVelocity>(
+ x[ai], x[avsite], getVOrNull(ai), getVOrNull(avsite));
+ break;
case F_VSITE2:
aj = ia[3];
- constr_vsite2(x[ai], x[aj], x[avsite], a1, pbc_null2);
+ constr_vsite2<calculatePosition, calculateVelocity>(x[ai],
+ x[aj],
+ x[avsite],
+ a1,
+ pbc_null2,
+ getVOrNull(ai),
+ getVOrNull(aj),
+ getVOrNull(avsite));
break;
case F_VSITE2FD:
aj = ia[3];
- constr_vsite2FD(x[ai], x[aj], x[avsite], a1, pbc_null2);
+ constr_vsite2FD<calculatePosition, calculateVelocity>(x[ai],
+ x[aj],
+ x[avsite],
+ a1,
+ pbc_null2,
+ getVOrNull(ai),
+ getVOrNull(aj),
+ getVOrNull(avsite));
break;
case F_VSITE3:
aj = ia[3];
ak = ia[4];
b1 = ip[tp].vsite.b;
- constr_vsite3(x[ai], x[aj], x[ak], x[avsite], a1, b1, pbc_null2);
+ constr_vsite3<calculatePosition, calculateVelocity>(x[ai],
+ x[aj],
+ x[ak],
+ x[avsite],
+ a1,
+ b1,
+ pbc_null2,
+ getVOrNull(ai),
+ getVOrNull(aj),
+ getVOrNull(ak),
+ getVOrNull(avsite));
break;
case F_VSITE3FD:
aj = ia[3];
ak = ia[4];
b1 = ip[tp].vsite.b;
- constr_vsite3FD(x[ai], x[aj], x[ak], x[avsite], a1, b1, pbc_null2);
+ constr_vsite3FD<calculatePosition, calculateVelocity>(x[ai],
+ x[aj],
+ x[ak],
+ x[avsite],
+ a1,
+ b1,
+ pbc_null2,
+ getVOrNull(ai),
+ getVOrNull(aj),
+ getVOrNull(ak),
+ getVOrNull(avsite));
break;
case F_VSITE3FAD:
aj = ia[3];
ak = ia[4];
b1 = ip[tp].vsite.b;
- constr_vsite3FAD(x[ai], x[aj], x[ak], x[avsite], a1, b1, pbc_null2);
+ constr_vsite3FAD<calculatePosition, calculateVelocity>(x[ai],
+ x[aj],
+ x[ak],
+ x[avsite],
+ a1,
+ b1,
+ pbc_null2,
+ getVOrNull(ai),
+ getVOrNull(aj),
+ getVOrNull(ak),
+ getVOrNull(avsite));
break;
case F_VSITE3OUT:
aj = ia[3];
ak = ia[4];
b1 = ip[tp].vsite.b;
c1 = ip[tp].vsite.c;
- constr_vsite3OUT(x[ai], x[aj], x[ak], x[avsite], a1, b1, c1, pbc_null2);
+ constr_vsite3OUT<calculatePosition, calculateVelocity>(x[ai],
+ x[aj],
+ x[ak],
+ x[avsite],
+ a1,
+ b1,
+ c1,
+ pbc_null2,
+ getVOrNull(ai),
+ getVOrNull(aj),
+ getVOrNull(ak),
+ getVOrNull(avsite));
break;
case F_VSITE4FD:
aj = ia[3];
al = ia[5];
b1 = ip[tp].vsite.b;
c1 = ip[tp].vsite.c;
- constr_vsite4FD(x[ai], x[aj], x[ak], x[al], x[avsite], a1, b1, c1, pbc_null2);
+ constr_vsite4FD<calculatePosition, calculateVelocity>(x[ai],
+ x[aj],
+ x[ak],
+ x[al],
+ x[avsite],
+ a1,
+ b1,
+ c1,
+ pbc_null2,
+ getVOrNull(ai),
+ getVOrNull(aj),
+ getVOrNull(ak),
+ getVOrNull(al),
+ getVOrNull(avsite));
break;
case F_VSITE4FDN:
aj = ia[3];
al = ia[5];
b1 = ip[tp].vsite.b;
c1 = ip[tp].vsite.c;
- constr_vsite4FDN(x[ai], x[aj], x[ak], x[al], x[avsite], a1, b1, c1, pbc_null2);
+ constr_vsite4FDN<calculatePosition, calculateVelocity>(x[ai],
+ x[aj],
+ x[ak],
+ x[al],
+ x[avsite],
+ a1,
+ b1,
+ c1,
+ pbc_null2,
+ getVOrNull(ai),
+ getVOrNull(aj),
+ getVOrNull(ak),
+ getVOrNull(al),
+ getVOrNull(avsite));
+ break;
+ case F_VSITEN:
+ inc = constr_vsiten<calculatePosition, calculateVelocity>(ia, ip, x, pbc_null2, v);
break;
- case F_VSITEN: inc = constr_vsiten(ia, ip, x, pbc_null2); break;
default:
gmx_fatal(FARGS, "No such vsite type %d in %s, line %d", ftype, __FILE__, __LINE__);
}
/* Keep the vsite in the same periodic image as before */
rvec dx;
int ishift = pbc_dx_aiuc(pbc_null, x[avsite], xv, dx);
- if (ishift != CENTRAL)
+ if (ishift != c_centralShiftIndex)
{
rvec_add(xv, dx, x[avsite]);
}
}
- if (!v.empty())
- {
- /* Calculate velocity of vsite... */
- rvec vv;
- rvec_sub(x[avsite], xv, vv);
- svmul(inv_dt, vv, v[avsite]);
- }
/* Increment loop variables */
i += inc;
*
* \param[in] threadingInfo Used to divide work over threads when != nullptr
* \param[in,out] x Coordinates to construct vsites for
- * \param[in] dt Time step, needed when v is not empty
* \param[in,out] v When not empty, velocities are generated for virtual sites
* \param[in] ip Interaction parameters for all interaction, only vsite parameters are used
* \param[in] ilist The interaction lists, only vsites are usesd
* \param[in] domainInfo Information about PBC and DD
* \param[in] box Used for PBC when PBC is set in domainInfo
*/
+template<VSiteCalculatePosition calculatePosition, VSiteCalculateVelocity calculateVelocity>
static void construct_vsites(const ThreadingInfo* threadingInfo,
ArrayRef<RVec> x,
- real dt,
ArrayRef<RVec> v,
ArrayRef<const t_iparams> ip,
ArrayRef<const InteractionList> ilist,
*/
ivec null_ivec;
clear_ivec(null_ivec);
- pbc_null = set_pbc_dd(&pbc, domainInfo.pbcType_,
- useDomdec ? domainInfo.domdec_->numCells : null_ivec, FALSE, box);
+ pbc_null = set_pbc_dd(
+ &pbc, domainInfo.pbcType_, useDomdec ? domainInfo.domdec_->numCells : null_ivec, FALSE, box);
}
else
{
if (useDomdec)
{
- dd_move_x_vsites(*domainInfo.domdec_, box, as_rvec_array(x.data()));
+ if (calculateVelocity == VSiteCalculateVelocity::Yes)
+ {
+ dd_move_x_and_v_vsites(
+ *domainInfo.domdec_, box, as_rvec_array(x.data()), as_rvec_array(v.data()));
+ }
+ else
+ {
+ dd_move_x_vsites(*domainInfo.domdec_, box, as_rvec_array(x.data()));
+ }
}
if (threadingInfo == nullptr || threadingInfo->numThreads() == 1)
{
- construct_vsites_thread(x, dt, v, ip, ilist, pbc_null);
+ construct_vsites_thread<calculatePosition, calculateVelocity>(x, v, ip, ilist, pbc_null);
}
else
{
GMX_ASSERT(tData.rangeStart >= 0,
"The thread data should be initialized before calling construct_vsites");
- construct_vsites_thread(x, dt, v, ip, tData.ilist, pbc_null);
+ construct_vsites_thread<calculatePosition, calculateVelocity>(
+ x, v, ip, tData.ilist, pbc_null);
if (tData.useInterdependentTask)
{
/* Here we don't need a barrier (unlike the spreading),
* since both tasks only construct vsites from particles,
* or local vsites, not from non-local vsites.
*/
- construct_vsites_thread(x, dt, v, ip, tData.idTask.ilist, pbc_null);
+ construct_vsites_thread<calculatePosition, calculateVelocity>(
+ x, v, ip, tData.idTask.ilist, pbc_null);
}
}
GMX_CATCH_ALL_AND_EXIT_WITH_FATAL_ERROR
}
/* Now we can construct the vsites that might depend on other vsites */
- construct_vsites_thread(x, dt, v, ip, threadingInfo->threadDataNonLocalDependent().ilist, pbc_null);
+ construct_vsites_thread<calculatePosition, calculateVelocity>(
+ x, v, ip, threadingInfo->threadDataNonLocalDependent().ilist, pbc_null);
}
}
-void VirtualSitesHandler::Impl::construct(ArrayRef<RVec> x, real dt, ArrayRef<RVec> v, const matrix box) const
+void VirtualSitesHandler::Impl::construct(ArrayRef<RVec> x,
+ ArrayRef<RVec> v,
+ const matrix box,
+ VSiteOperation operation) const
{
- construct_vsites(&threadingInfo_, x, dt, v, iparams_, ilists_, domainInfo_, box);
+ switch (operation)
+ {
+ case VSiteOperation::Positions:
+ construct_vsites<VSiteCalculatePosition::Yes, VSiteCalculateVelocity::No>(
+ &threadingInfo_, x, v, iparams_, ilists_, domainInfo_, box);
+ break;
+ case VSiteOperation::Velocities:
+ construct_vsites<VSiteCalculatePosition::No, VSiteCalculateVelocity::Yes>(
+ &threadingInfo_, x, v, iparams_, ilists_, domainInfo_, box);
+ break;
+ case VSiteOperation::PositionsAndVelocities:
+ construct_vsites<VSiteCalculatePosition::Yes, VSiteCalculateVelocity::Yes>(
+ &threadingInfo_, x, v, iparams_, ilists_, domainInfo_, box);
+ break;
+ default: gmx_fatal(FARGS, "Unknown virtual site operation");
+ }
}
-void VirtualSitesHandler::construct(ArrayRef<RVec> x, real dt, ArrayRef<RVec> v, const matrix box) const
+void VirtualSitesHandler::construct(ArrayRef<RVec> x, ArrayRef<RVec> v, const matrix box, VSiteOperation operation) const
{
- impl_->construct(x, dt, v, box);
+ impl_->construct(x, v, box, operation);
}
void constructVirtualSites(ArrayRef<RVec> x, ArrayRef<const t_iparams> ip, ArrayRef<const InteractionList> ilist)
{
// No PBC, no DD
const DomainInfo domainInfo;
- construct_vsites(nullptr, x, 0, {}, ip, ilist, domainInfo, nullptr);
+ construct_vsites<VSiteCalculatePosition::Yes, VSiteCalculateVelocity::No>(
+ nullptr, x, {}, ip, ilist, domainInfo, nullptr);
}
#ifndef DOXYGEN
}
else
{
- siv = CENTRAL;
- sij = CENTRAL;
+ siv = c_centralShiftIndex;
+ sij = c_centralShiftIndex;
}
- if (siv != CENTRAL || sij != CENTRAL)
+ if (siv != c_centralShiftIndex || sij != c_centralShiftIndex)
{
rvec_inc(fshift[siv], f[av]);
- rvec_dec(fshift[CENTRAL], fi);
+ rvec_dec(fshift[c_centralShiftIndex], fi);
rvec_dec(fshift[sij], fj);
}
}
int atomOffset = mtop.moleculeBlockIndices[mb].globalAtomStart;
for (int mol = 0; mol < molb.nmol; mol++)
{
- constructVirtualSites(x.subArray(atomOffset, molt.atoms.nr), mtop.ffparams.iparams,
- molt.ilist);
+ constructVirtualSites(
+ x.subArray(atomOffset, molt.atoms.nr), mtop.ffparams.iparams, molt.ilist);
atomOffset += molt.atoms.nr;
}
}
}
else
{
- svi = CENTRAL;
+ svi = c_centralShiftIndex;
}
- if (svi != CENTRAL || sji != CENTRAL)
+ if (svi != c_centralShiftIndex || sji != c_centralShiftIndex)
{
rvec_dec(fshift[svi], fv);
- fshift[CENTRAL][XX] += fv[XX] - fj[XX];
- fshift[CENTRAL][YY] += fv[YY] - fj[YY];
- fshift[CENTRAL][ZZ] += fv[ZZ] - fj[ZZ];
+ fshift[c_centralShiftIndex][XX] += fv[XX] - fj[XX];
+ fshift[c_centralShiftIndex][YY] += fv[YY] - fj[YY];
+ fshift[c_centralShiftIndex][ZZ] += fv[ZZ] - fj[ZZ];
fshift[sji][XX] += fj[XX];
fshift[sji][YY] += fj[YY];
fshift[sji][ZZ] += fj[ZZ];
}
else
{
- siv = CENTRAL;
- sij = CENTRAL;
- sik = CENTRAL;
+ siv = c_centralShiftIndex;
+ sij = c_centralShiftIndex;
+ sik = c_centralShiftIndex;
}
- if (siv != CENTRAL || sij != CENTRAL || sik != CENTRAL)
+ if (siv != c_centralShiftIndex || sij != c_centralShiftIndex || sik != c_centralShiftIndex)
{
rvec_inc(fshift[siv], f[av]);
- rvec_dec(fshift[CENTRAL], fi);
+ rvec_dec(fshift[c_centralShiftIndex], fi);
rvec_dec(fshift[sij], fj);
rvec_dec(fshift[sik], fk);
}
}
else
{
- svi = CENTRAL;
+ svi = c_centralShiftIndex;
}
- if (svi != CENTRAL || sji != CENTRAL || skj != CENTRAL)
+ if (svi != c_centralShiftIndex || sji != c_centralShiftIndex || skj != c_centralShiftIndex)
{
rvec_dec(fshift[svi], fv);
- fshift[CENTRAL][XX] += fv[XX] - (1 + a) * temp[XX];
- fshift[CENTRAL][YY] += fv[YY] - (1 + a) * temp[YY];
- fshift[CENTRAL][ZZ] += fv[ZZ] - (1 + a) * temp[ZZ];
+ fshift[c_centralShiftIndex][XX] += fv[XX] - (1 + a) * temp[XX];
+ fshift[c_centralShiftIndex][YY] += fv[YY] - (1 + a) * temp[YY];
+ fshift[c_centralShiftIndex][ZZ] += fv[ZZ] - (1 + a) * temp[ZZ];
fshift[sji][XX] += temp[XX];
fshift[sji][YY] += temp[YY];
fshift[sji][ZZ] += temp[ZZ];
}
else
{
- svi = CENTRAL;
+ svi = c_centralShiftIndex;
}
- if (svi != CENTRAL || sji != CENTRAL || skj != CENTRAL)
+ if (svi != c_centralShiftIndex || sji != c_centralShiftIndex || skj != c_centralShiftIndex)
{
rvec_dec(fshift[svi], fv);
- fshift[CENTRAL][XX] += fv[XX] - f1[XX] - (1 - c1) * f2[XX] + f3[XX];
- fshift[CENTRAL][YY] += fv[YY] - f1[YY] - (1 - c1) * f2[YY] + f3[YY];
- fshift[CENTRAL][ZZ] += fv[ZZ] - f1[ZZ] - (1 - c1) * f2[ZZ] + f3[ZZ];
+ fshift[c_centralShiftIndex][XX] += fv[XX] - f1[XX] - (1 - c1) * f2[XX] + f3[XX];
+ fshift[c_centralShiftIndex][YY] += fv[YY] - f1[YY] - (1 - c1) * f2[YY] + f3[YY];
+ fshift[c_centralShiftIndex][ZZ] += fv[ZZ] - f1[ZZ] - (1 - c1) * f2[ZZ] + f3[ZZ];
fshift[sji][XX] += f1[XX] - c1 * f2[XX] - f3[XX];
fshift[sji][YY] += f1[YY] - c1 * f2[YY] - f3[YY];
fshift[sji][ZZ] += f1[ZZ] - c1 * f2[ZZ] - f3[ZZ];
}
else
{
- svi = CENTRAL;
+ svi = c_centralShiftIndex;
}
- if (svi != CENTRAL || sji != CENTRAL || ski != CENTRAL)
+ if (svi != c_centralShiftIndex || sji != c_centralShiftIndex || ski != c_centralShiftIndex)
{
rvec_dec(fshift[svi], fv);
- fshift[CENTRAL][XX] += fv[XX] - fj[XX] - fk[XX];
- fshift[CENTRAL][YY] += fv[YY] - fj[YY] - fk[YY];
- fshift[CENTRAL][ZZ] += fv[ZZ] - fj[ZZ] - fk[ZZ];
+ fshift[c_centralShiftIndex][XX] += fv[XX] - fj[XX] - fk[XX];
+ fshift[c_centralShiftIndex][YY] += fv[YY] - fj[YY] - fk[YY];
+ fshift[c_centralShiftIndex][ZZ] += fv[ZZ] - fj[ZZ] - fk[ZZ];
rvec_inc(fshift[sji], fj);
rvec_inc(fshift[ski], fk);
}
}
else
{
- svi = CENTRAL;
+ svi = c_centralShiftIndex;
}
- if (svi != CENTRAL || sji != CENTRAL || skj != CENTRAL || slj != CENTRAL)
+ if (svi != c_centralShiftIndex || sji != c_centralShiftIndex || skj != c_centralShiftIndex
+ || slj != c_centralShiftIndex)
{
rvec_dec(fshift[svi], fv);
for (m = 0; m < DIM; m++)
{
- fshift[CENTRAL][m] += fv[m] - (1 + a + b) * temp[m];
+ fshift[c_centralShiftIndex][m] += fv[m] - (1 + a + b) * temp[m];
fshift[sji][m] += temp[m];
fshift[skj][m] += a * temp[m];
fshift[slj][m] += b * temp[m];
}
else
{
- svi = CENTRAL;
+ svi = c_centralShiftIndex;
}
- if (svi != CENTRAL || sij != CENTRAL || sik != CENTRAL || sil != CENTRAL)
+ if (svi != c_centralShiftIndex || sij != c_centralShiftIndex || sik != c_centralShiftIndex
+ || sil != c_centralShiftIndex)
{
rvec_dec(fshift[svi], fv);
- fshift[CENTRAL][XX] += fv[XX] - fj[XX] - fk[XX] - fl[XX];
- fshift[CENTRAL][YY] += fv[YY] - fj[YY] - fk[YY] - fl[YY];
- fshift[CENTRAL][ZZ] += fv[ZZ] - fj[ZZ] - fk[ZZ] - fl[ZZ];
+ fshift[c_centralShiftIndex][XX] += fv[XX] - fj[XX] - fk[XX] - fl[XX];
+ fshift[c_centralShiftIndex][YY] += fv[YY] - fj[YY] - fk[YY] - fl[YY];
+ fshift[c_centralShiftIndex][ZZ] += fv[ZZ] - fj[ZZ] - fk[ZZ] - fl[ZZ];
rvec_inc(fshift[sij], fj);
rvec_inc(fshift[sik], fk);
rvec_inc(fshift[sil], fl);
}
else
{
- siv = CENTRAL;
+ siv = c_centralShiftIndex;
}
a = ip[ia[i]].vsiten.a;
svmul(a, f[av], fi);
rvec_inc(f[ai], fi);
- if (virialHandling == VirialHandling::Pbc && siv != CENTRAL)
+ if (virialHandling == VirialHandling::Pbc && siv != c_centralShiftIndex)
{
rvec_inc(fshift[siv], fi);
- rvec_dec(fshift[CENTRAL], fi);
+ rvec_dec(fshift[c_centralShiftIndex], fi);
}
/* 6 Flops */
}
const matrix box,
gmx_wallcycle* wcycle)
{
- wallcycle_start(wcycle, ewcVSITESPREAD);
+ wallcycle_start(wcycle, WallCycleCounter::VsiteSpread);
const bool useDomdec = domainInfo_.useDomdec();
/* This is wasting some CPU time as we now do this multiple times
* per MD step.
*/
- pbc_null = set_pbc_dd(&pbc, domainInfo_.pbcType_,
- useDomdec ? domainInfo_.domdec_->numCells : nullptr, FALSE, box);
+ pbc_null = set_pbc_dd(
+ &pbc, domainInfo_.pbcType_, useDomdec ? domainInfo_.domdec_->numCells : nullptr, FALSE, box);
}
else
{
{
/* First spread the vsites that might depend on non-local vsites */
auto& nlDependentVSites = threadingInfo_.threadDataNonLocalDependent();
- spreadForceWrapper(x, f, virialHandling, fshift, nlDependentVSites.dxdf, true, iparams_,
- nlDependentVSites.ilist, pbc_null);
+ spreadForceWrapper(x,
+ f,
+ virialHandling,
+ fshift,
+ nlDependentVSites.dxdf,
+ true,
+ iparams_,
+ nlDependentVSites.ilist,
+ pbc_null);
#pragma omp parallel num_threads(numThreads)
{
{
fshift_t = tData.fshift;
- for (int i = 0; i < SHIFTS; i++)
+ for (int i = 0; i < c_numShiftVectors; i++)
{
clear_rvec(fshift_t[i]);
}
{
copy_rvec(f[idTask->vsite[i]], idTask->force[idTask->vsite[i]]);
}
- spreadForceWrapper(x, idTask->force, virialHandling, fshift_t, tData.dxdf, true,
- iparams_, tData.idTask.ilist, pbc_null);
+ spreadForceWrapper(x,
+ idTask->force,
+ virialHandling,
+ fshift_t,
+ tData.dxdf,
+ true,
+ iparams_,
+ tData.idTask.ilist,
+ pbc_null);
/* We need a barrier before reducing forces below
* that have been produced by a different thread above.
}
/* Spread the vsites that spread locally only */
- spreadForceWrapper(x, f, virialHandling, fshift_t, tData.dxdf, false, iparams_,
- tData.ilist, pbc_null);
+ spreadForceWrapper(
+ x, f, virialHandling, fshift_t, tData.dxdf, false, iparams_, tData.ilist, pbc_null);
}
GMX_CATCH_ALL_AND_EXIT_WITH_FATAL_ERROR
}
{
for (int th = 1; th < numThreads; th++)
{
- for (int i = 0; i < SHIFTS; i++)
+ for (int i = 0; i < c_numShiftVectors; i++)
{
rvec_inc(fshift[i], threadingInfo_.threadData(th).fshift[i]);
}
inc_nrnb(nrnb, eNR_VSITE4FDN, vsite_count(ilists_, F_VSITE4FDN));
inc_nrnb(nrnb, eNR_VSITEN, vsite_count(ilists_, F_VSITEN));
- wallcycle_stop(wcycle, ewcVSITESPREAD);
+ wallcycle_stop(wcycle, WallCycleCounter::VsiteSpread);
}
/*! \brief Returns the an array with group indices for each atom
}
int countInterUpdategroupVsites(const gmx_mtop_t& mtop,
- gmx::ArrayRef<const gmx::RangePartitioning> updateGroupingPerMoleculetype)
+ gmx::ArrayRef<const gmx::RangePartitioning> updateGroupingsPerMoleculeType)
{
int n_intercg_vsite = 0;
for (const gmx_molblock_t& molb : mtop.molblock)
const gmx_moltype_t& molt = mtop.moltype[molb.type];
std::vector<int> atomToGroup;
- if (!updateGroupingPerMoleculetype.empty())
+ if (!updateGroupingsPerMoleculeType.empty())
{
- atomToGroup = makeAtomToGroupMapping(updateGroupingPerMoleculetype[molb.type]);
+ atomToGroup = makeAtomToGroupMapping(updateGroupingsPerMoleculeType[molb.type]);
}
for (int ftype = c_ftypeVsiteStart; ftype < c_ftypeVsiteEnd; ftype++)
{
std::unique_ptr<VirtualSitesHandler> makeVirtualSitesHandler(const gmx_mtop_t& mtop,
const t_commrec* cr,
- PbcType pbcType)
+ PbcType pbcType,
+ ArrayRef<const RangePartitioning> updateGroupingPerMoleculeType)
{
GMX_RELEASE_ASSERT(cr != nullptr, "We need a valid commrec");
GMX_ASSERT(ftype >= c_ftypeVsiteStart && ftype < c_ftypeVsiteEnd,
"c_ftypeVsiteStart and/or c_ftypeVsiteEnd do not have correct values");
- nvsite += gmx_mtop_ftype_count(&mtop, ftype);
+ nvsite += gmx_mtop_ftype_count(mtop, ftype);
}
else
{
return vsite;
}
- return std::make_unique<VirtualSitesHandler>(mtop, cr->dd, pbcType);
+ return std::make_unique<VirtualSitesHandler>(mtop, cr->dd, pbcType, updateGroupingPerMoleculeType);
}
-ThreadingInfo::ThreadingInfo() : numThreads_(gmx_omp_nthreads_get(emntVSITE))
+ThreadingInfo::ThreadingInfo() : numThreads_(gmx_omp_nthreads_get(ModuleMultiThread::VirtualSite))
{
if (numThreads_ > 1)
{
}
}
-//! Returns the number of inter update-group vsites
-static int getNumInterUpdategroupVsites(const gmx_mtop_t& mtop, const gmx_domdec_t* domdec)
-{
- gmx::ArrayRef<const gmx::RangePartitioning> updateGroupingPerMoleculetype;
- if (domdec)
- {
- updateGroupingPerMoleculetype = getUpdateGroupingPerMoleculetype(*domdec);
- }
-
- return countInterUpdategroupVsites(mtop, updateGroupingPerMoleculetype);
-}
-
-VirtualSitesHandler::Impl::Impl(const gmx_mtop_t& mtop, gmx_domdec_t* domdec, const PbcType pbcType) :
- numInterUpdategroupVirtualSites_(getNumInterUpdategroupVsites(mtop, domdec)),
+VirtualSitesHandler::Impl::Impl(const gmx_mtop_t& mtop,
+ gmx_domdec_t* domdec,
+ const PbcType pbcType,
+ const ArrayRef<const RangePartitioning> updateGroupingPerMoleculeType) :
+ numInterUpdategroupVirtualSites_(countInterUpdategroupVsites(mtop, updateGroupingPerMoleculeType)),
domainInfo_({ pbcType, pbcType != PbcType::No && numInterUpdategroupVirtualSites_ > 0, domdec }),
iparams_(mtop.ffparams.iparams)
{
}
-VirtualSitesHandler::VirtualSitesHandler(const gmx_mtop_t& mtop, gmx_domdec_t* domdec, const PbcType pbcType) :
- impl_(new Impl(mtop, domdec, pbcType))
+VirtualSitesHandler::VirtualSitesHandler(const gmx_mtop_t& mtop,
+ gmx_domdec_t* domdec,
+ const PbcType pbcType,
+ const ArrayRef<const RangePartitioning> updateGroupingPerMoleculeType) :
+ impl_(new Impl(mtop, domdec, pbcType, updateGroupingPerMoleculeType))
{
}
gmx::ArrayRef<int> taskIndex,
ArrayRef<const InteractionList> ilist,
ArrayRef<const t_iparams> ip,
- const unsigned short* ptype)
+ ArrayRef<const ParticleType> ptype)
{
for (int ftype = c_ftypeVsiteStart; ftype < c_ftypeVsiteEnd; ftype++)
{
/* Do a range check to avoid a harmless race on taskIndex */
if (iat[j] < tData->rangeStart || iat[j] >= tData->rangeEnd || taskIndex[iat[j]] != thread)
{
- if (!tData->useInterdependentTask || ptype[iat[j]] == eptVSite)
+ if (!tData->useInterdependentTask || ptype[iat[j]] == ParticleType::VSite)
{
/* At least one constructing atom is a vsite
* that is not assigned to the same thread.
/* Do a range check to avoid a harmless race on taskIndex */
if (iat[j] < tData->rangeStart || iat[j] >= tData->rangeEnd || taskIndex[iat[j]] != thread)
{
- GMX_ASSERT(ptype[iat[j]] != eptVSite,
+ GMX_ASSERT(ptype[iat[j]] != ParticleType::VSite,
"A vsite to be assigned in assignVsitesToThread has a vsite as "
"a constructing atom that does not belong to our task, such "
"vsites should be assigned to the single 'master' task");
- task = nthread + thread;
+ if (tData->useInterdependentTask)
+ {
+ // Assign to the interdependent task
+ task = nthread + thread;
+ }
+ else
+ {
+ // Assign to the separate, non-parallel task
+ task = 2 * nthread;
+ }
}
}
}
void ThreadingInfo::setVirtualSites(ArrayRef<const InteractionList> ilists,
ArrayRef<const t_iparams> iparams,
- const t_mdatoms& mdatoms,
+ const int numAtoms,
+ const int homenr,
+ ArrayRef<const ParticleType> ptype,
const bool useDomdec)
{
if (numThreads_ <= 1)
* When assigning vsites to threads, we should take care that the last
* threads also covers the non-local range.
*/
- vsite_atom_range = mdatoms.nr;
- natperthread = (mdatoms.homenr + numThreads_ - 1) / numThreads_;
+ vsite_atom_range = numAtoms;
+ natperthread = (homenr + numThreads_ - 1) / numThreads_;
}
if (debug)
{
- fprintf(debug, "virtual site thread dist: natoms %d, range %d, natperthread %d\n",
- mdatoms.nr, vsite_atom_range, natperthread);
+ fprintf(debug,
+ "virtual site thread dist: natoms %d, range %d, natperthread %d\n",
+ numAtoms,
+ vsite_atom_range,
+ natperthread);
}
/* To simplify the vsite assignment, we make an index which tells us
* to which task particles, both non-vsites and vsites, are assigned.
*/
- taskIndex_.resize(mdatoms.nr);
+ taskIndex_.resize(numAtoms);
/* Initialize the task index array. Here we assign the non-vsite
* particles to task=thread, so we easily figure out if vsites
*/
{
int thread = 0;
- for (int i = 0; i < mdatoms.nr; i++)
+ for (int i = 0; i < numAtoms; i++)
{
- if (mdatoms.ptype[i] == eptVSite)
+ if (ptype[i] == ParticleType::VSite)
{
/* vsites are not assigned to a task yet */
taskIndex_[i] = -1;
}
/* To avoid large f_buf allocations of #threads*vsite_atom_range
- * we don't use task2 with more than 200000 atoms. This doesn't
- * affect performance, since with such a large range relatively few
- * vsites will end up in the separate task.
- * Note that useTask2 should be the same for all threads.
+ * we don't use the interdependent tasks with more than 200000 atoms.
+ * This doesn't affect performance, since with such a large range
+ * relatively few vsites will end up in the separate task.
+ * Note that useInterdependentTask should be the same for all threads.
*/
- tData.useInterdependentTask = (vsite_atom_range <= 200000);
+ const int c_maxNumLocalAtomsForInterdependentTask = 200000;
+ tData.useInterdependentTask = (vsite_atom_range <= c_maxNumLocalAtomsForInterdependentTask);
if (tData.useInterdependentTask)
{
size_t natoms_use_in_vsites = vsite_atom_range;
else
{
/* The last thread should cover up to the end of the range */
- tData.rangeEnd = mdatoms.nr;
+ tData.rangeEnd = numAtoms;
}
- assignVsitesToThread(&tData, thread, numThreads_, natperthread, taskIndex_, ilists,
- iparams, mdatoms.ptype);
+ assignVsitesToThread(
+ &tData, thread, numThreads_, natperthread, taskIndex_, ilists, iparams, ptype);
if (tData.useInterdependentTask)
{
if (debug && numThreads_ > 1)
{
- fprintf(debug, "virtual site useInterdependentTask %d, nuse:\n",
+ fprintf(debug,
+ "virtual site useInterdependentTask %d, nuse:\n",
static_cast<int>(tData_[0]->useInterdependentTask));
for (int th = 0; th < numThreads_ + 1; th++)
{
fprintf(debug, "%-20s thread dist:", interaction_function[ftype].longname);
for (int th = 0; th < numThreads_ + 1; th++)
{
- fprintf(debug, " %4d %4d ", tData_[th]->ilist[ftype].size(),
+ fprintf(debug,
+ " %4d %4d ",
+ tData_[th]->ilist[ftype].size(),
tData_[th]->idTask.ilist[ftype].size());
}
fprintf(debug, "\n");
}
void VirtualSitesHandler::Impl::setVirtualSites(ArrayRef<const InteractionList> ilists,
- const t_mdatoms& mdatoms)
+ const int numAtoms,
+ const int homenr,
+ ArrayRef<const ParticleType> ptype)
{
ilists_ = ilists;
- threadingInfo_.setVirtualSites(ilists, iparams_, mdatoms, domainInfo_.useDomdec());
+ threadingInfo_.setVirtualSites(ilists, iparams_, numAtoms, homenr, ptype, domainInfo_.useDomdec());
}
-void VirtualSitesHandler::setVirtualSites(ArrayRef<const InteractionList> ilists, const t_mdatoms& mdatoms)
+void VirtualSitesHandler::setVirtualSites(ArrayRef<const InteractionList> ilists,
+ const int numAtoms,
+ const int homenr,
+ ArrayRef<const ParticleType> ptype)
{
- impl_->setVirtualSites(ilists, mdatoms);
+ impl_->setVirtualSites(ilists, numAtoms, homenr, ptype);
}
} // namespace gmx
#include "gromacs/domdec/domdec_struct.h"
#include "gromacs/domdec/gpuhaloexchange.h"
#include "gromacs/domdec/localatomsetmanager.h"
+#include "gromacs/domdec/makebondedlinks.h"
#include "gromacs/domdec/partition.h"
+#include "gromacs/domdec/reversetopology.h"
#include "gromacs/ewald/ewald_utils.h"
+#include "gromacs/ewald/pme.h"
#include "gromacs/ewald/pme_gpu_program.h"
#include "gromacs/ewald/pme_only.h"
#include "gromacs/ewald/pme_pp_comm_gpu.h"
#include "gromacs/hardware/printhardware.h"
#include "gromacs/imd/imd.h"
#include "gromacs/listed_forces/disre.h"
-#include "gromacs/listed_forces/gpubonded.h"
+#include "gromacs/listed_forces/listed_forces_gpu.h"
#include "gromacs/listed_forces/listed_forces.h"
#include "gromacs/listed_forces/orires.h"
#include "gromacs/math/functions.h"
#include "gromacs/mdtypes/mdatom.h"
#include "gromacs/mdtypes/mdrunoptions.h"
#include "gromacs/mdtypes/observableshistory.h"
+#include "gromacs/mdtypes/observablesreducer.h"
#include "gromacs/mdtypes/simulation_workload.h"
#include "gromacs/mdtypes/state.h"
#include "gromacs/mdtypes/state_propagator_data_gpu.h"
#include "gromacs/utility/keyvaluetree.h"
#include "gromacs/utility/logger.h"
#include "gromacs/utility/loggerbuilder.h"
-#include "gromacs/utility/mdmodulenotification.h"
+#include "gromacs/utility/mdmodulesnotifiers.h"
#include "gromacs/utility/physicalnodecommunicator.h"
#include "gromacs/utility/pleasecite.h"
#include "gromacs/utility/programcontext.h"
#include "gromacs/utility/smalloc.h"
#include "gromacs/utility/stringutil.h"
+#include "gromacs/utility/mpiinfo.h"
#include "isimulator.h"
#include "membedholder.h"
{
DevelopmentFeatureFlags devFlags;
- // Some builds of GCC 5 give false positive warnings that these
- // getenv results are ignored when clearly they are used.
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wunused-result"
-
devFlags.enableGpuBufferOps =
GMX_GPU_CUDA && useGpuForNonbonded && (getenv("GMX_USE_GPU_BUFFER_OPS") != nullptr);
- devFlags.enableGpuHaloExchange = GMX_GPU_CUDA && GMX_THREAD_MPI && getenv("GMX_GPU_DD_COMMS") != nullptr;
+ devFlags.enableGpuHaloExchange = GMX_MPI && GMX_GPU_CUDA && getenv("GMX_GPU_DD_COMMS") != nullptr;
devFlags.forceGpuUpdateDefault = (getenv("GMX_FORCE_UPDATE_DEFAULT_GPU") != nullptr) || GMX_FAHCORE;
- devFlags.enableGpuPmePPComm =
- GMX_GPU_CUDA && GMX_THREAD_MPI && getenv("GMX_GPU_PME_PP_COMMS") != nullptr;
+ devFlags.enableGpuPmePPComm = GMX_MPI && GMX_GPU_CUDA && getenv("GMX_GPU_PME_PP_COMMS") != nullptr;
+
+ // Direct GPU comm path is being used with CUDA_AWARE_MPI
+ // make sure underlying MPI implementation is CUDA-aware
+ if (!GMX_THREAD_MPI && (devFlags.enableGpuPmePPComm || devFlags.enableGpuHaloExchange))
+ {
+ const bool haveDetectedCudaAwareMpi =
+ (checkMpiCudaAwareSupport() == CudaAwareMpiStatus::Supported);
+ const bool forceCudaAwareMpi = (getenv("GMX_FORCE_CUDA_AWARE_MPI") != nullptr);
+
+ if (!haveDetectedCudaAwareMpi && forceCudaAwareMpi)
+ {
+ // CUDA-aware support not detected in MPI library but, user has forced it's use
+ GMX_LOG(mdlog.warning)
+ .asParagraph()
+ .appendTextFormatted(
+ "This run has forced use of 'CUDA-aware MPI'. "
+ "But, GROMACS cannot determine if underlying MPI "
+ "is CUDA-aware. GROMACS recommends use of latest openMPI version "
+ "for CUDA-aware support. "
+ "If you observe failures at runtime, try unsetting "
+ "GMX_FORCE_CUDA_AWARE_MPI environment variable.");
+ }
+
+ if (haveDetectedCudaAwareMpi || forceCudaAwareMpi)
+ {
+ devFlags.usingCudaAwareMpi = true;
+ GMX_LOG(mdlog.warning)
+ .asParagraph()
+ .appendTextFormatted(
+ "Using CUDA-aware MPI for 'GPU halo exchange' or 'GPU PME-PP "
+ "communications' feature.");
+ }
+ else
+ {
+ if (devFlags.enableGpuHaloExchange)
+ {
+ GMX_LOG(mdlog.warning)
+ .asParagraph()
+ .appendTextFormatted(
+ "GMX_GPU_DD_COMMS environment variable detected, but the 'GPU "
+ "halo exchange' feature will not be enabled as GROMACS couldn't "
+ "detect CUDA_aware support in underlying MPI implementation.");
+ devFlags.enableGpuHaloExchange = false;
+ }
+ if (devFlags.enableGpuPmePPComm)
+ {
+ GMX_LOG(mdlog.warning)
+ .asParagraph()
+ .appendText(
+ "GMX_GPU_PME_PP_COMMS environment variable detected, but the "
+ "'GPU PME-PP communications' feature will not be enabled as "
+ "GROMACS couldn't "
+ "detect CUDA_aware support in underlying MPI implementation.");
+ devFlags.enableGpuPmePPComm = false;
+ }
-#pragma GCC diagnostic pop
+ GMX_LOG(mdlog.warning)
+ .asParagraph()
+ .appendTextFormatted(
+ "GROMACS recommends use of latest OpenMPI version for CUDA-aware "
+ "support. "
+ "If you are certain about CUDA-aware support in your MPI library, "
+ "you can force it's use by setting environment variable "
+ " GMX_FORCE_CUDA_AWARE_MPI.");
+ }
+ }
if (devFlags.enableGpuBufferOps)
{
{
try
{
- auto masterMdrunner = reinterpret_cast<const gmx::Mdrunner*>(arg);
+ const auto* masterMdrunner = reinterpret_cast<const gmx::Mdrunner*>(arg);
/* copy the arg list to make sure that it's thread-local. This
doesn't copy pointed-to items, of course; fnm, cr and fplog
are reset in the call below, all others should be const. */
#if GMX_THREAD_MPI
/* now spawn new threads that start mdrunner_start_fn(), while
the main thread returns. Thread affinity is handled later. */
- if (tMPI_Init_fn(TRUE, numThreadsToLaunch, TMPI_AFFINITY_NONE, mdrunner_start_fn,
- static_cast<const void*>(this))
+ if (tMPI_Init_fn(TRUE, numThreadsToLaunch, TMPI_AFFINITY_NONE, mdrunner_start_fn, static_cast<const void*>(this))
!= TMPI_SUCCESS)
{
GMX_THROW(gmx::InternalError("Failed to spawn thread-MPI threads"));
t_commrec* cr,
t_inputrec* ir,
int nstlist_cmdline,
- const gmx_mtop_t* mtop,
+ const gmx_mtop_t& mtop,
const matrix box,
bool makeGpuPairList,
const gmx::CpuInfo& cpuinfo)
{
// We checked the cut-offs in grompp, but double-check here.
// We have PME+LJcutoff kernels for rcoulomb>rvdw.
- if (EEL_PME_EWALD(ir->coulombtype) && ir->vdwtype == eelCUT)
+ if (EEL_PME_EWALD(ir->coulombtype) && ir->vdwtype == VanDerWaalsType::Cut)
{
GMX_RELEASE_ASSERT(ir->rcoulomb >= ir->rvdw,
"With Verlet lists and PME we should have rcoulomb>=rvdw");
"With Verlet lists and no PME rcoulomb and rvdw should be identical");
}
/* For NVE simulations, we will retain the initial list buffer */
- if (EI_DYNAMICS(ir->eI) && ir->verletbuf_tol > 0 && !(EI_MD(ir->eI) && ir->etc == etcNO))
+ if (EI_DYNAMICS(ir->eI) && ir->verletbuf_tol > 0
+ && !(EI_MD(ir->eI) && ir->etc == TemperatureCoupling::No))
{
/* Update the Verlet buffer size for the current run setup */
VerletbufListSetup listSetup = verletbufGetSafeListSetup(listType);
const real rlist_new =
- calcVerletBufferSize(*mtop, det(box), *ir, ir->nstlist, ir->nstlist - 1, -1, listSetup);
+ calcVerletBufferSize(mtop, det(box), *ir, ir->nstlist, ir->nstlist - 1, -1, listSetup);
if (rlist_new != ir->rlist)
{
{
fprintf(fplog,
"\nChanging rlist from %g to %g for non-bonded %dx%d atom kernels\n\n",
- ir->rlist, rlist_new, listSetup.cluster_size_i, listSetup.cluster_size_j);
+ ir->rlist,
+ rlist_new,
+ listSetup.cluster_size_i,
+ listSetup.cluster_size_j);
}
ir->rlist = rlist_new;
}
if (nstlist_cmdline > 0 && (!EI_DYNAMICS(ir->eI) || ir->verletbuf_tol <= 0))
{
- gmx_fatal(FARGS, "Can not set nstlist without %s",
+ gmx_fatal(FARGS,
+ "Can not set nstlist without %s",
!EI_DYNAMICS(ir->eI) ? "dynamics" : "verlet-buffer-tolerance");
}
if (EI_DYNAMICS(ir->eI))
{
/* Set or try nstlist values */
- increaseNstlist(fplog, cr, ir, nstlist_cmdline, mtop, box, makeGpuPairList, cpuinfo);
+ increaseNstlist(fplog, cr, ir, nstlist_cmdline, &mtop, box, makeGpuPairList, cpuinfo);
}
}
{
sprintf(sbuf_msg,
"Overriding nsteps with value passed on the command line: %s steps, %.3g ps",
- gmx_step_str(nsteps_cmdline, sbuf_steps), fabs(nsteps_cmdline * ir->delta_t));
+ gmx_step_str(nsteps_cmdline, sbuf_steps),
+ fabs(nsteps_cmdline * ir->delta_t));
}
else
{
- sprintf(sbuf_msg, "Overriding nsteps with value passed on the command line: %s steps",
+ sprintf(sbuf_msg,
+ "Overriding nsteps with value passed on the command line: %s steps",
gmx_step_str(nsteps_cmdline, sbuf_steps));
}
static void finish_run(FILE* fplog,
const gmx::MDLogger& mdlog,
const t_commrec* cr,
- const t_inputrec* inputrec,
+ const t_inputrec& inputrec,
t_nrnb nrnb[],
- gmx_wallcycle_t wcycle,
+ gmx_wallcycle* wcycle,
gmx_walltime_accounting_t walltime_accounting,
nonbonded_verlet_t* nbv,
const gmx_pme_t* pme,
Further, we only report performance for dynamical integrators,
because those are the only ones for which we plan to
consider doing any optimizations. */
- bool printReport = EI_DYNAMICS(inputrec->eI) && SIMMASTER(cr);
+ bool printReport = EI_DYNAMICS(inputrec.eI) && SIMMASTER(cr);
if (printReport && !walltime_accounting_get_valid_finish(walltime_accounting))
{
nrnbTotalStorage = std::make_unique<t_nrnb>();
nrnb_tot = nrnbTotalStorage.get();
#if GMX_MPI
- MPI_Allreduce(nrnb->n, nrnb_tot->n, eNRNB, MPI_DOUBLE, MPI_SUM, cr->mpi_comm_mysim);
+ MPI_Allreduce(nrnb->n.data(), nrnb_tot->n.data(), eNRNB, MPI_DOUBLE, MPI_SUM, cr->mpi_comm_mysim);
#endif
}
else
{
#if GMX_MPI
/* reduce elapsed_time over all MPI ranks in the current simulation */
- MPI_Allreduce(&elapsed_time, &elapsed_time_over_all_ranks, 1, MPI_DOUBLE, MPI_SUM,
- cr->mpi_comm_mysim);
+ MPI_Allreduce(&elapsed_time, &elapsed_time_over_all_ranks, 1, MPI_DOUBLE, MPI_SUM, cr->mpi_comm_mysim);
elapsed_time_over_all_ranks /= cr->nnodes;
/* Reduce elapsed_time_over_all_threads over all MPI ranks in the
* current simulation. */
- MPI_Allreduce(&elapsed_time_over_all_threads, &elapsed_time_over_all_threads_over_all_ranks,
- 1, MPI_DOUBLE, MPI_SUM, cr->mpi_comm_mysim);
+ MPI_Allreduce(&elapsed_time_over_all_threads,
+ &elapsed_time_over_all_threads_over_all_ranks,
+ 1,
+ MPI_DOUBLE,
+ MPI_SUM,
+ cr->mpi_comm_mysim);
#endif
}
else
print_flop(fplog, nrnb_tot, &nbfs, &mflop);
}
- if (thisRankHasDuty(cr, DUTY_PP) && DOMAINDECOMP(cr))
+ if (thisRankHasDuty(cr, DUTY_PP) && haveDDAtomOrdering(*cr))
{
print_dd_statistics(cr, inputrec, fplog);
}
* to the code that handled the thread region, so that there's a
* mechanism to keep cycle counting working during the transition
* to task parallelism. */
- int nthreads_pp = gmx_omp_nthreads_get(emntNonbonded);
- int nthreads_pme = gmx_omp_nthreads_get(emntPME);
- wallcycle_scale_by_num_threads(wcycle, thisRankHasDuty(cr, DUTY_PME) && !thisRankHasDuty(cr, DUTY_PP),
- nthreads_pp, nthreads_pme);
+ int nthreads_pp = gmx_omp_nthreads_get(ModuleMultiThread::Nonbonded);
+ int nthreads_pme = gmx_omp_nthreads_get(ModuleMultiThread::Pme);
+ wallcycle_scale_by_num_threads(
+ wcycle, thisRankHasDuty(cr, DUTY_PME) && !thisRankHasDuty(cr, DUTY_PP), nthreads_pp, nthreads_pme);
auto cycle_sum(wallcycle_sum(cr, wcycle));
if (printReport)
{
- auto nbnxn_gpu_timings =
+ auto* nbnxn_gpu_timings =
(nbv != nullptr && nbv->useGpu()) ? Nbnxm::gpu_get_timings(nbv->gpu_nbv) : nullptr;
gmx_wallclock_gpu_pme_t pme_gpu_timings = {};
{
pme_gpu_get_timings(pme, &pme_gpu_timings);
}
- wallcycle_print(fplog, mdlog, cr->nnodes, cr->npmenodes, nthreads_pp, nthreads_pme,
- elapsed_time_over_all_ranks, wcycle, cycle_sum, nbnxn_gpu_timings,
+ wallcycle_print(fplog,
+ mdlog,
+ cr->nnodes,
+ cr->npmenodes,
+ nthreads_pp,
+ nthreads_pme,
+ elapsed_time_over_all_ranks,
+ wcycle,
+ cycle_sum,
+ nbnxn_gpu_timings,
&pme_gpu_timings);
- if (EI_DYNAMICS(inputrec->eI))
+ if (EI_DYNAMICS(inputrec.eI))
{
- delta_t = inputrec->delta_t;
+ delta_t = inputrec.delta_t;
}
if (fplog)
{
- print_perf(fplog, elapsed_time_over_all_threads_over_all_ranks, elapsed_time_over_all_ranks,
+ print_perf(fplog,
+ elapsed_time_over_all_threads_over_all_ranks,
+ elapsed_time_over_all_ranks,
walltime_accounting_get_nsteps_done_since_reset(walltime_accounting),
- delta_t, nbfs, mflop);
+ delta_t,
+ nbfs,
+ mflop);
}
if (bWriteStat)
{
- print_perf(stderr, elapsed_time_over_all_threads_over_all_ranks, elapsed_time_over_all_ranks,
+ print_perf(stderr,
+ elapsed_time_over_all_threads_over_all_ranks,
+ elapsed_time_over_all_ranks,
walltime_accounting_get_nsteps_done_since_reset(walltime_accounting),
- delta_t, nbfs, mflop);
+ delta_t,
+ nbfs,
+ mflop);
}
}
}
int Mdrunner::mdrunner()
{
- matrix box;
- t_forcerec* fr = nullptr;
- real ewaldcoeff_q = 0;
- real ewaldcoeff_lj = 0;
- int nChargePerturbed = -1, nTypePerturbed = 0;
- gmx_wallcycle_t wcycle;
- gmx_walltime_accounting_t walltime_accounting = nullptr;
- MembedHolder membedHolder(filenames.size(), filenames.data());
+ matrix box;
+ std::unique_ptr<t_forcerec> fr;
+ real ewaldcoeff_q = 0;
+ real ewaldcoeff_lj = 0;
+ int nChargePerturbed = -1, nTypePerturbed = 0;
+ gmx_walltime_accounting_t walltime_accounting = nullptr;
+ MembedHolder membedHolder(filenames.size(), filenames.data());
/* CAUTION: threads may be started later on in this function, so
cr doesn't reflect the final parallel state right now */
gmx_print_detected_hardware(fplog, isSimulationMasterRank && isMasterSim(ms), mdlog, hwinfo_);
- std::vector<int> gpuIdsToUse = makeGpuIdsToUse(hwinfo_->deviceInfoList, hw_opt.gpuIdsAvailable);
- const int numDevicesToUse = gmx::ssize(gpuIdsToUse);
+ std::vector<int> availableDevices =
+ makeListOfAvailableDevices(hwinfo_->deviceInfoList, hw_opt.devicesSelectedByUser);
+ const int numAvailableDevices = gmx::ssize(availableDevices);
// Print citation requests after all software/hardware printing
pleaseCiteGromacs(fplog);
/* Read (nearly) all data required for the simulation
* and keep the partly serialized tpr contents to send to other ranks later
*/
- applyGlobalSimulationState(*inputHolder_.get(), partialDeserializedTpr.get(),
- globalState.get(), inputrec.get(), &mtop);
+ applyGlobalSimulationState(
+ *inputHolder_.get(), partialDeserializedTpr.get(), globalState.get(), inputrec.get(), &mtop);
}
/* Check and update the hardware options for internal consistency */
- checkAndUpdateHardwareOptions(mdlog, &hw_opt, isSimulationMasterRank, domdecOptions.numPmeRanks,
- inputrec.get());
+ checkAndUpdateHardwareOptions(
+ mdlog, &hw_opt, isSimulationMasterRank, domdecOptions.numPmeRanks, inputrec.get());
if (GMX_THREAD_MPI && isSimulationMasterRank)
{
// the number of GPUs to choose the number of ranks.
auto canUseGpuForNonbonded = buildSupportsNonbondedOnGpu(nullptr);
useGpuForNonbonded = decideWhetherToUseGpusForNonbondedWithThreadMpi(
- nonbondedTarget, numDevicesToUse, userGpuTaskAssignment, emulateGpuNonbonded,
+ nonbondedTarget,
+ numAvailableDevices > 0,
+ userGpuTaskAssignment,
+ emulateGpuNonbonded,
canUseGpuForNonbonded,
gpuAccelerationOfNonbondedIsUseful(mdlog, *inputrec, GMX_THREAD_MPI),
hw_opt.nthreads_tmpi);
- useGpuForPme = decideWhetherToUseGpusForPmeWithThreadMpi(
- useGpuForNonbonded, pmeTarget, pmeFftTarget, numDevicesToUse, userGpuTaskAssignment,
- *hwinfo_, *inputrec, hw_opt.nthreads_tmpi, domdecOptions.numPmeRanks);
+ useGpuForPme = decideWhetherToUseGpusForPmeWithThreadMpi(useGpuForNonbonded,
+ pmeTarget,
++ pmeFftTarget,
+ numAvailableDevices,
+ userGpuTaskAssignment,
+ *hwinfo_,
+ *inputrec,
+ hw_opt.nthreads_tmpi,
+ domdecOptions.numPmeRanks);
}
GMX_CATCH_ALL_AND_EXIT_WITH_FATAL_ERROR
* TODO Over-writing the user-supplied value here does
* prevent any possible subsequent checks from working
* correctly. */
- hw_opt.nthreads_tmpi =
- get_nthreads_mpi(hwinfo_, &hw_opt, numDevicesToUse, useGpuForNonbonded, useGpuForPme,
- inputrec.get(), &mtop, mdlog, membedHolder.doMembed());
+ hw_opt.nthreads_tmpi = get_nthreads_mpi(hwinfo_,
+ &hw_opt,
+ numAvailableDevices,
+ useGpuForNonbonded,
+ useGpuForPme,
+ inputrec.get(),
+ mtop,
+ mdlog,
+ membedHolder.doMembed());
// Now start the threads for thread MPI.
spawnThreads(hw_opt.nthreads_tmpi);
// master and spawned threads joins at the end of this block.
}
- GMX_RELEASE_ASSERT(ms || simulationCommunicator != MPI_COMM_NULL,
+ GMX_RELEASE_ASSERT(!GMX_MPI || ms || simulationCommunicator != MPI_COMM_NULL,
"Must have valid communicator unless running a multi-simulation");
CommrecHandle crHandle = init_commrec(simulationCommunicator);
t_commrec* cr = crHandle.get();
// On non-master ranks, allocate the object that will receive data in the following call.
inputrec = std::make_unique<t_inputrec>();
}
- init_parallel(cr->mpiDefaultCommunicator, MASTER(cr), inputrec.get(), &mtop,
+ init_parallel(cr->mpiDefaultCommunicator,
+ MASTER(cr),
+ inputrec.get(),
+ &mtop,
partialDeserializedTpr.get());
}
GMX_RELEASE_ASSERT(inputrec != nullptr, "All ranks should have a valid inputrec now");
partialDeserializedTpr.reset(nullptr);
+ GMX_RELEASE_ASSERT(
+ !inputrec->useConstantAcceleration,
+ "Linear acceleration has been removed in GROMACS 2022, and was broken for many years "
+ "before that. Use GROMACS 4.5 or earlier if you need this feature.");
+
+ // Now we decide whether to use the domain decomposition machinery.
+ // Note that this does not necessarily imply actually using multiple domains.
// Now the number of ranks is known to all ranks, and each knows
// the inputrec read by the master rank. The ranks can now all run
// the task-deciding functions and will agree on the result
// without needing to communicate.
- const bool useDomainDecomposition = (PAR(cr) && !(EI_TPI(inputrec->eI) || inputrec->eI == eiNM));
+ // The LBFGS minimizer, test-particle insertion, normal modes and shell dynamics don't support DD
+ const bool useDomainDecomposition =
+ !(inputrec->eI == IntegrationAlgorithm::LBFGS || EI_TPI(inputrec->eI)
+ || inputrec->eI == IntegrationAlgorithm::NM
+ || gmx_mtop_particletype_count(mtop)[ParticleType::Shell] > 0);
// Note that these variables describe only their own node.
//
// assignment.
auto canUseGpuForNonbonded = buildSupportsNonbondedOnGpu(nullptr);
useGpuForNonbonded = decideWhetherToUseGpusForNonbonded(
- nonbondedTarget, userGpuTaskAssignment, emulateGpuNonbonded, canUseGpuForNonbonded,
- gpuAccelerationOfNonbondedIsUseful(mdlog, *inputrec, !GMX_THREAD_MPI), gpusWereDetected);
- useGpuForPme = decideWhetherToUseGpusForPme(
- useGpuForNonbonded, pmeTarget, pmeFftTarget, userGpuTaskAssignment, *hwinfo_,
- *inputrec, cr->sizeOfDefaultCommunicator, domdecOptions.numPmeRanks, gpusWereDetected);
- useGpuForBonded = decideWhetherToUseGpusForBonded(useGpuForNonbonded, useGpuForPme,
- bondedTarget, *inputrec, mtop,
- domdecOptions.numPmeRanks, gpusWereDetected);
+ nonbondedTarget,
+ userGpuTaskAssignment,
+ emulateGpuNonbonded,
+ canUseGpuForNonbonded,
+ gpuAccelerationOfNonbondedIsUseful(mdlog, *inputrec, !GMX_THREAD_MPI),
+ gpusWereDetected);
+ useGpuForPme = decideWhetherToUseGpusForPme(useGpuForNonbonded,
+ pmeTarget,
++ pmeFftTarget,
+ userGpuTaskAssignment,
+ *hwinfo_,
+ *inputrec,
+ cr->sizeOfDefaultCommunicator,
+ domdecOptions.numPmeRanks,
+ gpusWereDetected);
+ useGpuForBonded = decideWhetherToUseGpusForBonded(
+ useGpuForNonbonded, useGpuForPme, bondedTarget, *inputrec, mtop, domdecOptions.numPmeRanks, gpusWereDetected);
}
GMX_CATCH_ALL_AND_EXIT_WITH_FATAL_ERROR
const DevelopmentFeatureFlags devFlags =
manageDevelopmentFeatures(mdlog, useGpuForNonbonded, pmeRunMode);
- const bool useModularSimulator =
- checkUseModularSimulator(false, inputrec.get(), doRerun, mtop, ms, replExParams,
- nullptr, doEssentialDynamics, membedHolder.doMembed());
+ const bool useModularSimulator = checkUseModularSimulator(false,
+ inputrec.get(),
+ doRerun,
+ mtop,
+ ms,
+ replExParams,
+ nullptr,
+ doEssentialDynamics,
+ membedHolder.doMembed());
+
+ ObservablesReducerBuilder observablesReducerBuilder;
// Build restraints.
// TODO: hide restraint implementation details from Mdrunner.
// TODO: Error handling
mdModules_->assignOptionsToModules(*inputrec->params, nullptr);
- // now that the MdModules know their options, they know which callbacks to sign up to
+ // now that the MDModules know their options, they know which callbacks to sign up to
mdModules_->subscribeToSimulationSetupNotifications();
- const auto& mdModulesNotifier = mdModules_->notifier().simulationSetupNotifications_;
+ const auto& setupNotifier = mdModules_->notifiers().simulationSetupNotifier_;
+
+ // Notify MdModules of existing logger
+ setupNotifier.notify(mdlog);
+ // Notify MdModules of internal parameters, saved into KVT
if (inputrec->internalParameters != nullptr)
{
- mdModulesNotifier.notify(*inputrec->internalParameters);
+ setupNotifier.notify(*inputrec->internalParameters);
+ }
+
+ // Let MdModules know the .tpr filename
+ {
+ gmx::MdRunInputFilename mdRunInputFilename = { ftp2fn(efTPR, filenames.size(), filenames.data()) };
+ setupNotifier.notify(mdRunInputFilename);
}
if (fplog != nullptr)
if (SIMMASTER(cr))
{
/* In rerun, set velocities to zero if present */
- if (doRerun && ((globalState->flags & (1 << estV)) != 0))
+ if (doRerun && ((globalState->flags & enumValueToBitMask(StateEntry::V)) != 0))
{
// rerun does not use velocities
GMX_LOG(mdlog.info)
{
clear_rvec(globalState->v[i]);
}
- globalState->flags &= ~(1 << estV);
+ globalState->flags &= ~enumValueToBitMask(StateEntry::V);
}
/* now make sure the state is initialized and propagated */
/* NM and TPI parallelize over force/energy calculations, not atoms,
* so we need to initialize and broadcast the global state.
*/
- if (inputrec->eI == eiNM || inputrec->eI == eiTPI)
+ if (inputrec->eI == IntegrationAlgorithm::NM || inputrec->eI == IntegrationAlgorithm::TPI)
{
if (!MASTER(cr))
{
globalState = std::make_unique<t_state>();
}
- broadcastStateWithoutDynamics(cr->mpiDefaultCommunicator, DOMAINDECOMP(cr), PAR(cr),
- globalState.get());
+ broadcastStateWithoutDynamics(
+ cr->mpiDefaultCommunicator, haveDDAtomOrdering(*cr), PAR(cr), globalState.get());
}
/* A parallel command line option consistency check that we can
#endif
}
- if (doRerun && (EI_ENERGY_MINIMIZATION(inputrec->eI) || eiNM == inputrec->eI))
+ if (doRerun && (EI_ENERGY_MINIMIZATION(inputrec->eI) || IntegrationAlgorithm::NM == inputrec->eI))
{
gmx_fatal(FARGS,
"The .mdp file specified an energy mininization or normal mode algorithm, and "
"these are not compatible with mdrun -rerun");
}
- if (!(EEL_PME(inputrec->coulombtype) || EVDW_PME(inputrec->vdwtype)))
- {
- if (domdecOptions.numPmeRanks > 0)
- {
- gmx_fatal_collective(FARGS, cr->mpiDefaultCommunicator, MASTER(cr),
- "PME-only ranks are requested, but the system does not use PME "
- "for electrostatics or LJ");
- }
-
- domdecOptions.numPmeRanks = 0;
- }
-
- if (useGpuForNonbonded && domdecOptions.numPmeRanks < 0)
- {
- /* With NB GPUs we don't automatically use PME-only CPU ranks. PME ranks can
- * improve performance with many threads per GPU, since our OpenMP
- * scaling is bad, but it's difficult to automate the setup.
- */
- domdecOptions.numPmeRanks = 0;
- }
- if (useGpuForPme)
- {
- if (domdecOptions.numPmeRanks < 0)
- {
- domdecOptions.numPmeRanks = 0;
- // TODO possibly print a note that one can opt-in for a separate PME GPU rank?
- }
- else
- {
- GMX_RELEASE_ASSERT(domdecOptions.numPmeRanks <= 1,
- "PME GPU decomposition is not supported");
- }
- }
-
/* NMR restraints must be initialized before load_checkpoint,
* since with time averaging the history is added to t_state.
* For proper consistency check we therefore need to extend
/* This needs to be called before read_checkpoint to extend the state */
t_disresdata* disresdata;
snew(disresdata, 1);
- init_disres(fplog, &mtop, inputrec.get(), DisResRunMode::MDRun,
+ init_disres(fplog,
+ mtop,
+ inputrec.get(),
+ DisResRunMode::MDRun,
MASTER(cr) ? DDRole::Master : DDRole::Agent,
- PAR(cr) ? NumRanks::Multiple : NumRanks::Single, cr->mpi_comm_mysim, ms, disresdata,
- globalState.get(), replExParams.exchangeInterval > 0);
+ PAR(cr) ? NumRanks::Multiple : NumRanks::Single,
+ cr->mpi_comm_mysim,
+ ms,
+ disresdata,
+ globalState.get(),
+ replExParams.exchangeInterval > 0);
- t_oriresdata* oriresdata;
- snew(oriresdata, 1);
- init_orires(fplog, &mtop, inputrec.get(), cr, ms, globalState.get(), oriresdata);
+ if (gmx_mtop_ftype_count(mtop, F_ORIRES) > 0 && isSimulationMasterRank)
+ {
+ extendStateWithOriresHistory(mtop, *inputrec, globalState.get());
+ }
- auto deform = prepareBoxDeformation(
- globalState != nullptr ? globalState->box : box, MASTER(cr) ? DDRole::Master : DDRole::Agent,
- PAR(cr) ? NumRanks::Multiple : NumRanks::Single, cr->mpi_comm_mygroup, *inputrec);
+ auto deform = prepareBoxDeformation(globalState != nullptr ? globalState->box : box,
+ MASTER(cr) ? DDRole::Master : DDRole::Agent,
+ PAR(cr) ? NumRanks::Multiple : NumRanks::Single,
+ cr->mpi_comm_mygroup,
+ *inputrec);
#if GMX_FAHCORE
/* We have to remember the generation's first step before reading checkpoint.
// Finish applying initial simulation state information from external sources on all ranks.
// Reconcile checkpoint file data with Mdrunner state established up to this point.
- applyLocalState(*inputHolder_.get(), logFileHandle, cr, domdecOptions.numCells,
- inputrec.get(), globalState.get(), &observablesHistory,
- mdrunOptions.reproducible, mdModules_->notifier(),
- modularSimulatorCheckpointData.get(), useModularSimulator);
+ applyLocalState(*inputHolder_.get(),
+ logFileHandle,
+ cr,
+ domdecOptions.numCells,
+ inputrec.get(),
+ globalState.get(),
+ &observablesHistory,
+ mdrunOptions.reproducible,
+ mdModules_->notifiers(),
+ modularSimulatorCheckpointData.get(),
+ useModularSimulator);
// TODO: (#3652) Synchronize filesystem state, SimulationInput contents, and program
// invariants
// on all code paths.
gmx_bcast(sizeof(box), box, cr->mpiDefaultCommunicator);
}
- if (inputrec->cutoff_scheme != ecutsVERLET)
+ if (inputrec->cutoff_scheme != CutoffScheme::Verlet)
{
gmx_fatal(FARGS,
"This group-scheme .tpr file can no longer be run by mdrun. Please update to the "
* increase rlist) tries to check if the newly chosen value fits with the DD scheme. As this is
* run before any DD scheme is set up, this check is never executed. See #3334 for more details.
*/
- prepare_verlet_scheme(fplog, cr, inputrec.get(), nstlist_cmdline, &mtop, box,
+ prepare_verlet_scheme(fplog,
+ cr,
+ inputrec.get(),
+ nstlist_cmdline,
+ mtop,
+ box,
useGpuForNonbonded || (emulateGpuNonbonded == EmulateGpuNonbonded::Yes),
*hwinfo_->cpuInfo);
+ // We need to decide on update groups early, as this affects
+ // inter-domain communication distances.
+ auto updateGroupingsPerMoleculeType = makeUpdateGroupingsPerMoleculeType(mtop);
+ const real maxUpdateGroupRadius = computeMaxUpdateGroupRadius(
+ mtop, updateGroupingsPerMoleculeType, maxReferenceTemperature(*inputrec));
+ const real cutoffMargin = std::sqrt(max_cutoff2(inputrec->pbcType, box)) - inputrec->rlist;
+ UpdateGroups updateGroups = makeUpdateGroups(mdlog,
+ std::move(updateGroupingsPerMoleculeType),
+ maxUpdateGroupRadius,
+ useDomainDecomposition,
+ systemHasConstraintsOrVsites(mtop),
+ cutoffMargin);
+
+ try
+ {
+ const bool haveFrozenAtoms = inputrecFrozenAtoms(inputrec.get());
+
+ useGpuForUpdate = decideWhetherToUseGpuForUpdate(useDomainDecomposition,
+ updateGroups.useUpdateGroups(),
+ pmeRunMode,
+ domdecOptions.numPmeRanks > 0,
+ useGpuForNonbonded,
+ updateTarget,
+ gpusWereDetected,
+ *inputrec,
+ mtop,
+ doEssentialDynamics,
+ gmx_mtop_ftype_count(mtop, F_ORIRES) > 0,
+ haveFrozenAtoms,
+ doRerun,
+ devFlags,
+ mdlog);
+ }
+ GMX_CATCH_ALL_AND_EXIT_WITH_FATAL_ERROR
+
+ bool useGpuDirectHalo = false;
+
+ if (useGpuForNonbonded)
+ {
+ // cr->npmenodes is not yet initialized.
+ // domdecOptions.numPmeRanks == -1 results in 0 separate PME ranks when useGpuForNonbonded is true.
+ // Todo: remove this assumption later once auto mode has support for separate PME rank
+ const int numPmeRanks = domdecOptions.numPmeRanks > 0 ? domdecOptions.numPmeRanks : 0;
+ bool havePPDomainDecomposition = (cr->sizeOfDefaultCommunicator - numPmeRanks) > 1;
+ useGpuDirectHalo = decideWhetherToUseGpuForHalo(devFlags,
+ havePPDomainDecomposition,
+ useGpuForNonbonded,
+ useModularSimulator,
+ doRerun,
+ EI_ENERGY_MINIMIZATION(inputrec->eI));
+ }
+
// This builder is necessary while we have multi-part construction
// of DD. Before DD is constructed, we use the existence of
// the builder object to indicate that further construction of DD
std::unique_ptr<DomainDecompositionBuilder> ddBuilder;
if (useDomainDecomposition)
{
- ddBuilder = std::make_unique<DomainDecompositionBuilder>(
- mdlog, cr, domdecOptions, mdrunOptions, mtop, *inputrec, box,
- positionsFromStatePointer(globalState.get()));
+ // P2P GPU comm + GPU update leads to case in which we enqueue async work for multiple
+ // timesteps. DLB needs to be disabled in that case
+ const bool directGpuCommUsedWithGpuUpdate = GMX_THREAD_MPI && useGpuDirectHalo && useGpuForUpdate;
+ ddBuilder = std::make_unique<DomainDecompositionBuilder>(
+ mdlog,
+ cr,
+ domdecOptions,
+ mdrunOptions,
+ mtop,
+ *inputrec,
+ mdModules_->notifiers(),
+ box,
+ updateGroups.updateGroupingPerMoleculeType(),
+ updateGroups.useUpdateGroups(),
+ updateGroups.maxUpdateGroupRadius(),
+ positionsFromStatePointer(globalState.get()),
+ useGpuForNonbonded,
+ useGpuForPme,
+ directGpuCommUsedWithGpuUpdate);
}
else
{
// Produce the task assignment for this rank - done after DD is constructed
GpuTaskAssignments gpuTaskAssignments = GpuTaskAssignmentsBuilder::build(
- gpuIdsToUse, userGpuTaskAssignment, *hwinfo_, simulationCommunicator, physicalNodeComm,
- nonbondedTarget, pmeTarget, bondedTarget, updateTarget, useGpuForNonbonded,
- useGpuForPme, thisRankHasDuty(cr, DUTY_PP),
+ availableDevices,
+ userGpuTaskAssignment,
+ *hwinfo_,
+ simulationCommunicator,
+ physicalNodeComm,
+ nonbondedTarget,
+ pmeTarget,
+ bondedTarget,
+ updateTarget,
+ useGpuForNonbonded,
+ useGpuForPme,
+ thisRankHasDuty(cr, DUTY_PP),
// TODO cr->duty & DUTY_PME should imply that a PME
// algorithm is active, but currently does not.
EEL_PME(inputrec->coulombtype) && thisRankHasDuty(cr, DUTY_PME));
// requires it (e.g. pull, CompEl, density fitting), so that we
// don't update the local atom sets unilaterally every step.
LocalAtomSetManager atomSets;
+
+ // Local state and topology are declared (and perhaps constructed)
+ // now, because DD needs them for the LocalTopologyChecker, but
+ // they do not contain valid data until after the first DD
+ // partition.
+ std::unique_ptr<t_state> localStateInstance;
+ t_state* localState;
+ gmx_localtop_t localTopology(mtop.ffparams);
+
if (ddBuilder)
{
+ localStateInstance = std::make_unique<t_state>();
+ localState = localStateInstance.get();
// TODO Pass the GPU streams to ddBuilder to use in buffer
// transfers (e.g. halo exchange)
- cr->dd = ddBuilder->build(&atomSets);
+ cr->dd = ddBuilder->build(&atomSets, localTopology, *localState, &observablesReducerBuilder);
// The builder's job is done, so destruct it
ddBuilder.reset(nullptr);
// Note that local state still does not exist yet.
}
+ else
+ {
+ // Without DD, the local state is merely an alias to the global state,
+ // so we don't need to allocate anything.
+ localState = globalState.get();
+ }
+
// Ensure that all atoms within the same update group are in the
// same periodic image. Otherwise, a simulation that did not use
// update groups (e.g. a single-rank simulation) cannot always be
}
}
- // The GPU update is decided here because we need to know whether the constraints or
- // SETTLEs can span accross the domain borders (i.e. whether or not update groups are
- // defined). This is only known after DD is initialized, hence decision on using GPU
- // update is done so late.
- try
- {
- const bool useUpdateGroups = cr->dd ? ddUsesUpdateGroups(*cr->dd) : false;
- const bool haveFrozenAtoms = inputrecFrozenAtoms(inputrec.get());
-
- useGpuForUpdate = decideWhetherToUseGpuForUpdate(
- useDomainDecomposition, useUpdateGroups, pmeRunMode, domdecOptions.numPmeRanks > 0,
- useGpuForNonbonded, updateTarget, gpusWereDetected, *inputrec, mtop,
- doEssentialDynamics, gmx_mtop_ftype_count(mtop, F_ORIRES) > 0,
- replExParams.exchangeInterval > 0, haveFrozenAtoms, doRerun, devFlags, mdlog);
- }
- GMX_CATCH_ALL_AND_EXIT_WITH_FATAL_ERROR
-
const bool printHostName = (cr->nnodes > 1);
gpuTaskAssignments.reportGpuUsage(mdlog, printHostName, useGpuForBonded, pmeRunMode, useGpuForUpdate);
MdrunScheduleWorkload runScheduleWork;
- bool useGpuDirectHalo = decideWhetherToUseGpuForHalo(
- devFlags, havePPDomainDecomposition(cr), useGpuForNonbonded, useModularSimulator,
- doRerun, EI_ENERGY_MINIMIZATION(inputrec->eI));
-
// Also populates the simulation constant workload description.
- runScheduleWork.simulationWork = createSimulationWorkload(
- *inputrec, disableNonbondedCalculation, devFlags, useGpuForNonbonded, pmeRunMode,
- useGpuForBonded, useGpuForUpdate, useGpuDirectHalo);
+ // Note: currently the default duty is DUTY_PP | DUTY_PME for all simulations, including those without PME,
+ // so this boolean is sufficient on all ranks to determine whether separate PME ranks are used,
+ // but this will no longer be the case if cr->duty is changed for !EEL_PME(fr->ic->eeltype).
+ const bool haveSeparatePmeRank = (!thisRankHasDuty(cr, DUTY_PP) || !thisRankHasDuty(cr, DUTY_PME));
+ runScheduleWork.simulationWork = createSimulationWorkload(*inputrec,
+ disableNonbondedCalculation,
+ devFlags,
+ havePPDomainDecomposition(cr),
+ haveSeparatePmeRank,
+ useGpuForNonbonded,
+ pmeRunMode,
+ useGpuForBonded,
+ useGpuForUpdate,
+ useGpuDirectHalo);
std::unique_ptr<DeviceStreamManager> deviceStreamManager = nullptr;
if (deviceInfo != nullptr)
{
- if (DOMAINDECOMP(cr) && thisRankHasDuty(cr, DUTY_PP))
+ if (runScheduleWork.simulationWork.havePpDomainDecomposition && thisRankHasDuty(cr, DUTY_PP))
{
dd_setup_dlb_resource_sharing(cr, deviceId);
}
// where appropriate.
if (!userGpuTaskAssignment.empty())
{
- gpuTaskAssignments.logPerformanceHints(mdlog, numDevicesToUse);
+ gpuTaskAssignments.logPerformanceHints(mdlog, numAvailableDevices);
}
if (PAR(cr))
.appendTextFormatted(
"This is simulation %d out of %d running as a composite GROMACS\n"
"multi-simulation job. Setup for this simulation:\n",
- ms->simulationIndex_, ms->numSimulations_);
+ ms->simulationIndex_,
+ ms->numSimulations_);
}
GMX_LOG(mdlog.warning)
- .appendTextFormatted("Using %d MPI %s\n", cr->nnodes,
+ .appendTextFormatted("Using %d MPI %s\n",
+ cr->nnodes,
# if GMX_THREAD_MPI
cr->nnodes == 1 ? "thread" : "threads"
# else
// the OpenMP support.
gmx_check_thread_affinity_set(mdlog, &hw_opt, hwinfo_->nthreads_hw_avail, FALSE);
/* Check and update the number of OpenMP threads requested */
- checkAndUpdateRequestedNumOpenmpThreads(&hw_opt, *hwinfo_, cr, ms, physicalNodeComm.size_,
- pmeRunMode, mtop, *inputrec);
-
- gmx_omp_nthreads_init(mdlog, cr, hwinfo_->nthreads_hw_avail, physicalNodeComm.size_,
- hw_opt.nthreads_omp, hw_opt.nthreads_omp_pme, !thisRankHasDuty(cr, DUTY_PP));
-
- // Enable FP exception detection, but not in
- // Release mode and not for compilers with known buggy FP
- // exception support (clang with any optimization) or suspected
- // buggy FP exception support (gcc 7.* with optimization).
-#if !defined NDEBUG \
- && !((defined __clang__ || (defined(__GNUC__) && !defined(__ICC) && __GNUC__ == 7)) \
- && defined __OPTIMIZE__)
- const bool bEnableFPE = true;
-#else
- const bool bEnableFPE = false;
-#endif
+ checkAndUpdateRequestedNumOpenmpThreads(
+ &hw_opt, *hwinfo_, cr, ms, physicalNodeComm.size_, pmeRunMode, mtop, *inputrec);
+
+ gmx_omp_nthreads_init(mdlog,
+ cr,
+ hwinfo_->nthreads_hw_avail,
+ physicalNodeComm.size_,
+ hw_opt.nthreads_omp,
+ hw_opt.nthreads_omp_pme,
+ !thisRankHasDuty(cr, DUTY_PP));
+
+ const bool bEnableFPE = gmxShouldEnableFPExceptions();
// FIXME - reconcile with gmx_feenableexcept() call from CommandLineModuleManager::run()
if (bEnableFPE)
{
}
/* Now that we know the setup is consistent, check for efficiency */
- check_resource_division_efficiency(hwinfo_, gpuTaskAssignments.thisRankHasAnyGpuTask(),
- mdrunOptions.ntompOptionIsSet, cr, mdlog);
+ check_resource_division_efficiency(
+ hwinfo_, gpuTaskAssignments.thisRankHasAnyGpuTask(), mdrunOptions.ntompOptionIsSet, cr, mdlog);
/* getting number of PP/PME threads on this MPI / tMPI rank.
PME: env variable should be read only on one node to make sure it is
identical everywhere;
*/
- const int numThreadsOnThisRank = thisRankHasDuty(cr, DUTY_PP) ? gmx_omp_nthreads_get(emntNonbonded)
- : gmx_omp_nthreads_get(emntPME);
- checkHardwareOversubscription(numThreadsOnThisRank, cr->nodeid, *hwinfo_->hardwareTopology,
- physicalNodeComm, mdlog);
+ const int numThreadsOnThisRank = thisRankHasDuty(cr, DUTY_PP)
+ ? gmx_omp_nthreads_get(ModuleMultiThread::Nonbonded)
+ : gmx_omp_nthreads_get(ModuleMultiThread::Pme);
+ checkHardwareOversubscription(
+ numThreadsOnThisRank, cr->nodeid, *hwinfo_->hardwareTopology, physicalNodeComm, mdlog);
// Enable Peer access between GPUs where available
// Only for DD, only master PP rank needs to perform setup, and only if thread MPI plus
// any of the GPU communication features are active.
- if (DOMAINDECOMP(cr) && MASTER(cr) && thisRankHasDuty(cr, DUTY_PP) && GMX_THREAD_MPI
+ if (haveDDAtomOrdering(*cr) && MASTER(cr) && thisRankHasDuty(cr, DUTY_PP) && GMX_THREAD_MPI
&& (runScheduleWork.simulationWork.useGpuHaloExchange
|| runScheduleWork.simulationWork.useGpuPmePpCommunication))
{
- setupGpuDevicePeerAccess(gpuIdsToUse, mdlog);
+ setupGpuDevicePeerAccess(gpuTaskAssignments.deviceIdsAssigned(), mdlog);
}
if (hw_opt.threadAffinity != ThreadAffinity::Off)
gmx_check_thread_affinity_set(mdlog, &hw_opt, hwinfo_->nthreads_hw_avail, TRUE);
int numThreadsOnThisNode, intraNodeThreadOffset;
- analyzeThreadsOnThisNode(physicalNodeComm, numThreadsOnThisRank, &numThreadsOnThisNode,
- &intraNodeThreadOffset);
+ analyzeThreadsOnThisNode(
+ physicalNodeComm, numThreadsOnThisRank, &numThreadsOnThisNode, &intraNodeThreadOffset);
/* Set the CPU affinity */
- gmx_set_thread_affinity(mdlog, cr, &hw_opt, *hwinfo_->hardwareTopology, numThreadsOnThisRank,
- numThreadsOnThisNode, intraNodeThreadOffset, nullptr);
+ gmx_set_thread_affinity(mdlog,
+ cr,
+ &hw_opt,
+ *hwinfo_->hardwareTopology,
+ numThreadsOnThisRank,
+ numThreadsOnThisNode,
+ intraNodeThreadOffset,
+ nullptr);
}
if (mdrunOptions.timingOptions.resetStep > -1)
"The -resetstep functionality is deprecated, and may be removed in a "
"future version.");
}
- wcycle = wallcycle_init(fplog, mdrunOptions.timingOptions.resetStep, cr);
+ std::unique_ptr<gmx_wallcycle> wcycle =
+ wallcycle_init(fplog, mdrunOptions.timingOptions.resetStep, cr);
if (PAR(cr))
{
/* Master synchronizes its value of reset_counters with all nodes
* including PME only nodes */
- int64_t reset_counters = wcycle_get_reset_counters(wcycle);
+ int64_t reset_counters = wcycle_get_reset_counters(wcycle.get());
gmx_bcast(sizeof(reset_counters), &reset_counters, cr->mpi_comm_mysim);
- wcycle_set_reset_counters(wcycle, reset_counters);
+ wcycle_set_reset_counters(wcycle.get(), reset_counters);
}
// Membrane embedding must be initialized before we call init_forcerec()
- membedHolder.initializeMembed(fplog, filenames.size(), filenames.data(), &mtop, inputrec.get(),
- globalState.get(), cr, &mdrunOptions.checkpointOptions.period);
+ membedHolder.initializeMembed(fplog,
+ filenames.size(),
+ filenames.data(),
+ &mtop,
+ inputrec.get(),
+ globalState.get(),
+ cr,
+ &mdrunOptions.checkpointOptions.period);
const bool thisRankHasPmeGpuTask = gpuTaskAssignments.thisRankHasPmeGpuTask();
std::unique_ptr<MDAtoms> mdAtoms;
std::unique_ptr<VirtualSitesHandler> vsite;
- std::unique_ptr<GpuBonded> gpuBonded;
t_nrnb nrnb;
if (thisRankHasDuty(cr, DUTY_PP))
{
- mdModulesNotifier.notify(*cr);
- mdModulesNotifier.notify(&atomSets);
- mdModulesNotifier.notify(inputrec->pbcType);
- mdModulesNotifier.notify(SimulationTimeStep{ inputrec->delta_t });
+ setupNotifier.notify(*cr);
+ setupNotifier.notify(&atomSets);
+ setupNotifier.notify(mtop);
+ setupNotifier.notify(inputrec->pbcType);
+ setupNotifier.notify(SimulationTimeStep{ inputrec->delta_t });
/* Initiate forcerecord */
- fr = new t_forcerec;
+ fr = std::make_unique<t_forcerec>();
fr->forceProviders = mdModules_->initForceProviders();
- init_forcerec(fplog, mdlog, fr, inputrec.get(), &mtop, cr, box,
+ init_forcerec(fplog,
+ mdlog,
+ runScheduleWork.simulationWork,
+ fr.get(),
+ *inputrec,
+ mtop,
+ cr,
+ box,
opt2fn("-table", filenames.size(), filenames.data()),
opt2fn("-tablep", filenames.size(), filenames.data()),
- opt2fns("-tableb", filenames.size(), filenames.data()), pforce);
+ opt2fns("-tableb", filenames.size(), filenames.data()),
+ pforce);
// Dirty hack, for fixing disres and orires should be made mdmodules
fr->fcdata->disres = disresdata;
- fr->fcdata->orires = oriresdata;
+ if (gmx_mtop_ftype_count(mtop, F_ORIRES) > 0)
+ {
+ fr->fcdata->orires = std::make_unique<t_oriresdata>(
+ fplog, mtop, *inputrec, ms, globalState.get(), &atomSets);
+ }
// Save a handle to device stream manager to use elsewhere in the code
// TODO: Forcerec is not a correct place to store it.
"GPU PP-PME stream should be valid in order to use GPU PME-PP direct "
"communications.");
fr->pmePpCommGpu = std::make_unique<gmx::PmePpCommGpu>(
- cr->mpi_comm_mysim, cr->dd->pme_nodeid, deviceStreamManager->context(),
+ cr->mpi_comm_mysim,
+ cr->dd->pme_nodeid,
+ &cr->dd->pmeForceReceiveBuffer,
+ deviceStreamManager->context(),
deviceStreamManager->stream(DeviceStreamType::PmePpTransfer));
}
- fr->nbv = Nbnxm::init_nb_verlet(mdlog, inputrec.get(), fr, cr, *hwinfo_,
+ fr->nbv = Nbnxm::init_nb_verlet(mdlog,
+ *inputrec,
+ *fr,
+ cr,
+ *hwinfo_,
runScheduleWork.simulationWork.useGpuNonbonded,
- deviceStreamManager.get(), &mtop, box, wcycle);
+ deviceStreamManager.get(),
+ mtop,
+ box,
+ wcycle.get());
// TODO: Move the logic below to a GPU bonded builder
if (runScheduleWork.simulationWork.useGpuBonded)
{
GMX_RELEASE_ASSERT(deviceStreamManager != nullptr,
"GPU device stream manager should be valid in order to use GPU "
"version of bonded forces.");
- gpuBonded = std::make_unique<GpuBonded>(
- mtop.ffparams, fr->ic->epsfac * fr->fudgeQQ, deviceStreamManager->context(),
- deviceStreamManager->bondedStream(havePPDomainDecomposition(cr)), wcycle);
- fr->gpuBonded = gpuBonded.get();
+ fr->listedForcesGpu = std::make_unique<ListedForcesGpu>(
+ mtop.ffparams,
+ fr->ic->epsfac * fr->fudgeQQ,
+ deviceStreamManager->context(),
+ deviceStreamManager->bondedStream(havePPDomainDecomposition(cr)),
+ wcycle.get());
}
+ fr->longRangeNonbondeds = std::make_unique<CpuPpLongRangeNonbondeds>(fr->n_tpi,
+ fr->ic->ewaldcoeff_q,
+ fr->ic->epsilon_r,
+ fr->qsum,
+ fr->ic->eeltype,
+ fr->ic->vdwtype,
+ *inputrec,
+ &nrnb,
+ wcycle.get(),
+ fplog);
/* Initialize the mdAtoms structure.
* mdAtoms is not filled with atom data,
}
/* Initialize the virtual site communication */
- vsite = makeVirtualSitesHandler(mtop, cr, fr->pbcType);
+ vsite = makeVirtualSitesHandler(
+ mtop, cr, fr->pbcType, updateGroups.updateGroupingPerMoleculeType());
calc_shifts(box, fr->shift_vec);
constructVirtualSitesGlobal(mtop, globalState->x);
}
}
+ // Make the DD reverse topology, now that any vsites that are present are available
+ if (haveDDAtomOrdering(*cr))
+ {
+ dd_make_reverse_top(fplog, cr->dd, mtop, vsite.get(), *inputrec, domdecOptions.ddBondedChecking);
+ }
if (EEL_PME(fr->ic->eeltype) || EVDW_PME(fr->ic->vdwtype))
{
const DeviceContext* deviceContext = runScheduleWork.simulationWork.useGpuPme
? &deviceStreamManager->context()
: nullptr;
- const DeviceStream* pmeStream =
+ const DeviceStream* pmeStream =
runScheduleWork.simulationWork.useGpuPme
- ? &deviceStreamManager->stream(DeviceStreamType::Pme)
- : nullptr;
-
- pmedata = gmx_pme_init(cr, getNumPmeDomains(cr->dd), inputrec.get(),
- nChargePerturbed != 0, nTypePerturbed != 0,
- mdrunOptions.reproducible, ewaldcoeff_q, ewaldcoeff_lj,
- gmx_omp_nthreads_get(emntPME), pmeRunMode, nullptr,
- deviceContext, pmeStream, pmeGpuProgram.get(), mdlog);
+ ? &deviceStreamManager->stream(DeviceStreamType::Pme)
+ : nullptr;
+
+ pmedata = gmx_pme_init(cr,
+ getNumPmeDomains(cr->dd),
+ inputrec.get(),
+ nChargePerturbed != 0,
+ nTypePerturbed != 0,
+ mdrunOptions.reproducible,
+ ewaldcoeff_q,
+ ewaldcoeff_lj,
+ gmx_omp_nthreads_get(ModuleMultiThread::Pme),
+ pmeRunMode,
+ nullptr,
+ deviceContext,
+ pmeStream,
+ pmeGpuProgram.get(),
+ mdlog);
}
GMX_CATCH_ALL_AND_EXIT_WITH_FATAL_ERROR
}
if (thisRankHasDuty(cr, DUTY_PP))
{
/* Assumes uniform use of the number of OpenMP threads */
- walltime_accounting = walltime_accounting_init(gmx_omp_nthreads_get(emntDefault));
+ walltime_accounting = walltime_accounting_init(gmx_omp_nthreads_get(ModuleMultiThread::Default));
if (inputrec->bPull)
{
/* Initialize pull code */
- pull_work = init_pull(fplog, inputrec->pull.get(), inputrec.get(), &mtop, cr, &atomSets,
+ pull_work = init_pull(fplog,
+ inputrec->pull.get(),
+ inputrec.get(),
+ mtop,
+ cr,
+ &atomSets,
inputrec->fepvals->init_lambda);
if (inputrec->pull->bXOutAverage || inputrec->pull->bFOutAverage)
{
if (inputrec->bRot)
{
/* Initialize enforced rotation code */
- enforcedRotation = init_rot(fplog, inputrec.get(), filenames.size(), filenames.data(),
- cr, &atomSets, globalState.get(), &mtop, oenv, mdrunOptions,
+ enforcedRotation = init_rot(fplog,
+ inputrec.get(),
+ filenames.size(),
+ filenames.data(),
+ cr,
+ &atomSets,
+ globalState.get(),
+ mtop,
+ oenv,
+ mdrunOptions,
startingBehavior);
}
t_swap* swap = nullptr;
- if (inputrec->eSwapCoords != eswapNO)
+ if (inputrec->eSwapCoords != SwapType::No)
{
/* Initialize ion swapping code */
- swap = init_swapcoords(fplog, inputrec.get(),
+ swap = init_swapcoords(fplog,
+ inputrec.get(),
opt2fn_master("-swap", filenames.size(), filenames.data(), cr),
- &mtop, globalState.get(), &observablesHistory, cr, &atomSets,
- oenv, mdrunOptions, startingBehavior);
+ mtop,
+ globalState.get(),
+ &observablesHistory,
+ cr,
+ &atomSets,
+ oenv,
+ mdrunOptions,
+ startingBehavior);
}
/* Let makeConstraints know whether we have essential dynamics constraints. */
- auto constr = makeConstraints(mtop, *inputrec, pull_work, doEssentialDynamics, fplog, cr,
- ms, &nrnb, wcycle, fr->bMolPBC);
+ auto constr = makeConstraints(mtop,
+ *inputrec,
+ pull_work,
+ doEssentialDynamics,
+ fplog,
+ cr,
+ updateGroups.useUpdateGroups(),
+ ms,
+ &nrnb,
+ wcycle.get(),
+ fr->bMolPBC,
+ &observablesReducerBuilder);
/* Energy terms and groups */
gmx_enerdata_t enerd(mtop.groups.groups[SimulationAtomGroupType::EnergyOutput].size(),
// cos acceleration is only supported by md, but older tpr
// files might still combine it with other integrators
- GMX_RELEASE_ASSERT(inputrec->cos_accel == 0.0 || inputrec->eI == eiMD,
+ GMX_RELEASE_ASSERT(inputrec->cos_accel == 0.0 || inputrec->eI == IntegrationAlgorithm::MD,
"cos_acceleration is only supported by integrator=md");
/* Kinetic energy data */
- gmx_ekindata_t ekind;
- init_ekindata(fplog, &mtop, &(inputrec->opts), &ekind, inputrec->cos_accel);
+ gmx_ekindata_t ekind(inputrec->opts.ngtc,
+ inputrec->cos_accel,
+ gmx_omp_nthreads_get(ModuleMultiThread::Update));
/* Set up interactive MD (IMD) */
- auto imdSession =
- makeImdSession(inputrec.get(), cr, wcycle, &enerd, ms, &mtop, mdlog,
- MASTER(cr) ? globalState->x.rvec_array() : nullptr, filenames.size(),
- filenames.data(), oenv, mdrunOptions.imdOptions, startingBehavior);
-
- if (DOMAINDECOMP(cr))
+ auto imdSession = makeImdSession(inputrec.get(),
+ cr,
+ wcycle.get(),
+ &enerd,
+ ms,
+ mtop,
+ mdlog,
+ MASTER(cr) ? globalState->x : gmx::ArrayRef<gmx::RVec>(),
+ filenames.size(),
+ filenames.data(),
+ oenv,
+ mdrunOptions.imdOptions,
+ startingBehavior);
+
+ if (haveDDAtomOrdering(*cr))
{
GMX_RELEASE_ASSERT(fr, "fr was NULL while cr->duty was DUTY_PP");
- /* This call is not included in init_domain_decomposition mainly
- * because fr->cginfo_mb is set later.
+ /* This call is not included in init_domain_decomposition
+ * because fr->atomInfoForEachMoleculeBlock is set later.
*/
- dd_init_bondeds(fplog, cr->dd, mtop, vsite.get(), inputrec.get(),
- domdecOptions.checkBondedInteractions, fr->cginfo_mb);
+ makeBondedLinks(cr->dd, mtop, fr->atomInfoForEachMoleculeBlock);
}
if (runScheduleWork.simulationWork.useGpuBufferOps)
{
fr->gpuForceReduction[gmx::AtomLocality::Local] = std::make_unique<gmx::GpuForceReduction>(
deviceStreamManager->context(),
- deviceStreamManager->stream(gmx::DeviceStreamType::NonBondedLocal), wcycle);
+ deviceStreamManager->stream(gmx::DeviceStreamType::NonBondedLocal),
+ wcycle.get());
fr->gpuForceReduction[gmx::AtomLocality::NonLocal] = std::make_unique<gmx::GpuForceReduction>(
deviceStreamManager->context(),
- deviceStreamManager->stream(gmx::DeviceStreamType::NonBondedNonLocal), wcycle);
+ deviceStreamManager->stream(gmx::DeviceStreamType::NonBondedNonLocal),
+ wcycle.get());
}
std::unique_ptr<gmx::StatePropagatorDataGpu> stateGpu;
&& ((runScheduleWork.simulationWork.useGpuPme && thisRankHasDuty(cr, DUTY_PME))
|| runScheduleWork.simulationWork.useGpuBufferOps))
{
- GpuApiCallBehavior transferKind = (inputrec->eI == eiMD && !doRerun && !useModularSimulator)
- ? GpuApiCallBehavior::Async
- : GpuApiCallBehavior::Sync;
+ GpuApiCallBehavior transferKind =
+ (inputrec->eI == IntegrationAlgorithm::MD && !doRerun && !useModularSimulator)
+ ? GpuApiCallBehavior::Async
+ : GpuApiCallBehavior::Sync;
GMX_RELEASE_ASSERT(deviceStreamManager != nullptr,
"GPU device stream manager should be initialized to use GPU.");
stateGpu = std::make_unique<gmx::StatePropagatorDataGpu>(
- *deviceStreamManager, transferKind, pme_gpu_get_block_size(fr->pmedata), wcycle);
+ *deviceStreamManager, transferKind, pme_gpu_get_block_size(fr->pmedata), wcycle.get());
fr->stateGpu = stateGpu.get();
}
GMX_ASSERT(stopHandlerBuilder_, "Runner must provide StopHandlerBuilder to simulator.");
SimulatorBuilder simulatorBuilder;
- simulatorBuilder.add(SimulatorStateData(globalState.get(), &observablesHistory, &enerd, &ekind));
+ simulatorBuilder.add(SimulatorStateData(
+ globalState.get(), localState, &observablesHistory, &enerd, &ekind));
simulatorBuilder.add(std::move(membedHolder));
simulatorBuilder.add(std::move(stopHandlerBuilder_));
simulatorBuilder.add(SimulatorConfig(mdrunOptions, startingBehavior, &runScheduleWork));
- simulatorBuilder.add(SimulatorEnv(fplog, cr, ms, mdlog, oenv));
- simulatorBuilder.add(Profiling(&nrnb, walltime_accounting, wcycle));
+ simulatorBuilder.add(SimulatorEnv(fplog, cr, ms, mdlog, oenv, &observablesReducerBuilder));
+ simulatorBuilder.add(Profiling(&nrnb, walltime_accounting, wcycle.get()));
simulatorBuilder.add(ConstraintsParam(
- constr.get(), enforcedRotation ? enforcedRotation->getLegacyEnfrot() : nullptr,
- vsite.get()));
+ constr.get(), enforcedRotation ? enforcedRotation->getLegacyEnfrot() : nullptr, vsite.get()));
// TODO: Separate `fr` to a separate add, and make the `build` handle the coupling sensibly.
- simulatorBuilder.add(LegacyInput(static_cast<int>(filenames.size()), filenames.data(),
- inputrec.get(), fr));
+ simulatorBuilder.add(LegacyInput(
+ static_cast<int>(filenames.size()), filenames.data(), inputrec.get(), fr.get()));
simulatorBuilder.add(ReplicaExchangeParameters(replExParams));
simulatorBuilder.add(InteractiveMD(imdSession.get()));
- simulatorBuilder.add(SimulatorModules(mdModules_->outputProvider(), mdModules_->notifier()));
+ simulatorBuilder.add(SimulatorModules(mdModules_->outputProvider(), mdModules_->notifiers()));
simulatorBuilder.add(CenterOfMassPulling(pull_work));
// Todo move to an MDModule
simulatorBuilder.add(IonSwapping(swap));
- simulatorBuilder.add(TopologyData(&mtop, mdAtoms.get()));
+ simulatorBuilder.add(TopologyData(mtop, &localTopology, mdAtoms.get()));
simulatorBuilder.add(BoxDeformationHandle(deform.get()));
simulatorBuilder.add(std::move(modularSimulatorCheckpointData));
{
GMX_RELEASE_ASSERT(pmedata, "pmedata was NULL while cr->duty was not DUTY_PP");
/* do PME only */
- walltime_accounting = walltime_accounting_init(gmx_omp_nthreads_get(emntPME));
- gmx_pmeonly(pmedata, cr, &nrnb, wcycle, walltime_accounting, inputrec.get(), pmeRunMode,
+ walltime_accounting = walltime_accounting_init(gmx_omp_nthreads_get(ModuleMultiThread::Pme));
+ gmx_pmeonly(pmedata,
+ cr,
+ &nrnb,
+ wcycle.get(),
+ walltime_accounting,
+ inputrec.get(),
+ pmeRunMode,
+ runScheduleWork.simulationWork.useGpuPmePpCommunication,
deviceStreamManager.get());
}
- wallcycle_stop(wcycle, ewcRUN);
+ wallcycle_stop(wcycle.get(), WallCycleCounter::Run);
/* Finish up, write some stuff
* if rerunMD, don't write last frame again
*/
- finish_run(fplog, mdlog, cr, inputrec.get(), &nrnb, wcycle, walltime_accounting,
- fr ? fr->nbv.get() : nullptr, pmedata, EI_DYNAMICS(inputrec->eI) && !isMultiSim(ms));
+ finish_run(fplog,
+ mdlog,
+ cr,
+ *inputrec,
+ &nrnb,
+ wcycle.get(),
+ walltime_accounting,
+ fr ? fr->nbv.get() : nullptr,
+ pmedata,
+ EI_DYNAMICS(inputrec->eI) && !isMultiSim(ms));
- // clean up cycle counter
- wallcycle_destroy(wcycle);
deviceStreamManager.reset(nullptr);
// Free PME data
// As soon as we destroy GPU contexts after mdrunner() exits, these lines should go.
mdAtoms.reset(nullptr);
globalState.reset(nullptr);
+ localStateInstance.reset(nullptr);
mdModules_.reset(nullptr); // destruct force providers here as they might also use the GPU
- gpuBonded.reset(nullptr);
- /* Free pinned buffers in *fr */
- delete fr;
- fr = nullptr;
+ fr.reset(nullptr); // destruct forcerec before gpu
// TODO convert to C++ so we can get rid of these frees
sfree(disresdata);
- sfree(oriresdata);
if (!hwinfo_->deviceInfoList.empty())
{
{
physicalNodeComm.barrier();
}
- releaseDevice(deviceInfo);
+
+ if (!devFlags.usingCudaAwareMpi)
+ {
+ // Don't reset GPU in case of CUDA-AWARE MPI
+ // UCX creates CUDA buffers which are cleaned-up as part of MPI_Finalize()
+ // resetting the device before MPI_Finalize() results in crashes inside UCX
+ releaseDevice(deviceInfo);
+ }
/* Does what it says */
print_date_and_time(fplog, cr->nodeid, "Finished mdrun", gmx_gettime());
*
* Copyright (c) 2009,2010,2011,2012,2013 by the GROMACS development team.
* Copyright (c) 2014,2015,2016,2017,2018 by the GROMACS development team.
- * Copyright (c) 2019,2020, by the GROMACS development team, led by
+ * Copyright (c) 2019,2020,2021, by the GROMACS development team, led by
* Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
* and including many others, as listed in the AUTHORS file in the
* top-level source directory and at http://www.gromacs.org.
{
if (sel->flags & SEL_UNSORTED)
{
+ gmx_ana_index_reserve(sel->v.u.g, sel->u.cgrp.isize);
// This only works if g contains all the atoms, but that is currently
// the only supported case.
gmx_ana_index_copy(sel->v.u.g, &sel->u.cgrp, false);
expandValueForPositions(sel->v.u.r, &sel->v.nr, sel->u.expr.pos);
break;
default:
- GMX_RELEASE_ASSERT(false,
- "Unimplemented value type for position update method");
+ GMX_RELEASE_ASSERT(false, "Unimplemented value type for position update method");
}
}
}
#include "gromacs/hardware/detecthardware.h"
#include "gromacs/hardware/hardwaretopology.h"
#include "gromacs/hardware/hw_info.h"
-#include "gromacs/listed_forces/gpubonded.h"
+#include "gromacs/listed_forces/listed_forces_gpu.h"
#include "gromacs/mdlib/gmx_omp_nthreads.h"
#include "gromacs/mdlib/update_constrain_gpu.h"
#include "gromacs/mdtypes/commrec.h"
{
//! Helper variable to localise the text of an often repeated message.
-const char* g_specifyEverythingFormatString =
+const char* const g_specifyEverythingFormatString =
"When you use mdrun -gputasks, %s must be set to non-default "
"values, so that the device IDs can be interpreted correctly."
#if GMX_GPU
// OpenCL standard, but the only current relevant case for GROMACS
// is AMD OpenCL, which offers this variable.
"GPU_DEVICE_ORDINAL"
-# elif GMX_GPU_SYCL
- // As with OpenCL, there are no portable way to do it.
- // Intel reference: https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md
- // While SYCL_DEVICE_FILTER is a better option, as of 2021.1-beta10 it is not yet supported.
- "SYCL_DEVICE_ALLOWLIST"
+# elif GMX_GPU_SYCL && GMX_SYCL_DPCPP
+ // https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md
+ "SYCL_DEVICE_FILTER"
+# elif GMX_GPU_SYCL && GMX_SYCL_HIPSYCL
+ // Not true if we use hipSYCL over CUDA or IntelLLVM, but in that case the user probably
+ // knows what they are doing.
+ // https://rocmdocs.amd.com/en/latest/Other_Solutions/Other-Solutions.html#hip-environment-variables
+ "HIP_VISIBLE_DEVICES"
# else
# error "Unreachable branch"
# endif
} // namespace
bool decideWhetherToUseGpusForNonbondedWithThreadMpi(const TaskTarget nonbondedTarget,
- const int numDevicesToUse,
+ const bool haveAvailableDevices,
const std::vector<int>& userGpuTaskAssignment,
const EmulateGpuNonbonded emulateGpuNonbonded,
const bool buildSupportsNonbondedOnGpu,
// all potential ranks can use, and can use that in a global
// decision that will later be consistent.
// If we get here, then the user permitted or required GPUs.
- return (numDevicesToUse > 0);
+ return haveAvailableDevices;
}
bool decideWhetherToUseGpusForPmeWithThreadMpi(const bool useGpuForNonbonded,
const TaskTarget pmeTarget,
+ const TaskTarget pmeFftTarget,
const int numDevicesToUse,
const std::vector<int>& userGpuTaskAssignment,
const gmx_hw_info_t& hardwareInfo,
if ((pmeTarget == TaskTarget::Cpu) || !useGpuForNonbonded || !pme_gpu_supports_build(nullptr)
|| !pme_gpu_supports_hardware(hardwareInfo, nullptr) || !pme_gpu_supports_input(inputrec, nullptr))
{
- // PME can't run on a GPU. If the user required that, we issue
- // an error later.
+ // PME can't run on a GPU. If the user required that, we issue an error later.
+ return false;
+ }
+
+ if (pmeFftTarget == TaskTarget::Cpu && !pme_gpu_mixed_mode_supports_input(inputrec, nullptr))
+ {
+ /* User requested PME FFT on CPU, but the current system is not compatible with Mixed mode,
+ * so we don't use GPUs at all.
+ * If the user had -pme gpu, we issue an error later. */
return false;
}
bool decideWhetherToUseGpusForPme(const bool useGpuForNonbonded,
const TaskTarget pmeTarget,
+ const TaskTarget pmeFftTarget,
const std::vector<int>& userGpuTaskAssignment,
const gmx_hw_info_t& hardwareInfo,
const t_inputrec& inputrec,
}
return false;
}
+ if (pmeFftTarget == TaskTarget::Cpu && !pme_gpu_mixed_mode_supports_input(inputrec, &message))
+ {
+ /* User requested PME FFT on CPU, but the current system is not compatible with Mixed mode,
+ * so we don't use GPUs at all. */
+ if (pmeTarget == TaskTarget::Gpu)
+ {
+ GMX_THROW(NotImplementedError("Cannot compute PME interactions in Mixed mode, because " + message));
+ }
+ return false;
+ }
if (pmeTarget == TaskTarget::Cpu)
{
std::string errorMessage;
- if (!buildSupportsGpuBondeds(&errorMessage))
+ if (!buildSupportsListedForcesGpu(&errorMessage))
{
if (bondedTarget == TaskTarget::Gpu)
{
return false;
}
- if (!inputSupportsGpuBondeds(inputrec, mtop, &errorMessage))
+ if (!inputSupportsListedForcesGpu(inputrec, mtop, &errorMessage))
{
if (bondedTarget == TaskTarget::Gpu)
{
// is busy, for which we currently only check PME or Ewald.
// (It would be better to dynamically assign bondeds based on timings)
// Note that here we assume that the auto setting of PME ranks will not
- // choose seperate PME ranks when nonBonded are assigned to the GPU.
+ // choose separate PME ranks when nonBonded are assigned to the GPU.
bool usingOurCpuForPmeOrEwald =
(EVDW_PME(inputrec.vdwtype)
|| (EEL_PME_EWALD(inputrec.coulombtype) && !useGpuForPme && numPmeRanksPerSimulation <= 0));
const gmx_mtop_t& mtop,
const bool useEssentialDynamics,
const bool doOrientationRestraints,
- const bool useReplicaExchange,
const bool haveFrozenAtoms,
const bool doRerun,
const DevelopmentFeatureFlags& devFlags,
errorMessage += "Multiple time stepping is not supported.\n";
}
- if (inputrec.eConstrAlg == econtSHAKE && hasAnyConstraints && gmx_mtop_ftype_count(mtop, F_CONSTR) > 0)
+ if (inputrec.eConstrAlg == ConstraintAlgorithm::Shake && hasAnyConstraints
+ && gmx_mtop_ftype_count(mtop, F_CONSTR) > 0)
{
errorMessage += "SHAKE constraints are not supported.\n";
}
errorMessage +=
"Either PME or short-ranged non-bonded interaction tasks must run on the GPU.\n";
}
-
if (!gpusWereDetected)
{
errorMessage += "Compatible GPUs must have been found.\n";
{
errorMessage += "Only a CUDA build is supported.\n";
}
- if (inputrec.eI != eiMD)
+ if (inputrec.eI != IntegrationAlgorithm::MD)
{
errorMessage += "Only the md integrator is supported.\n";
}
- if (inputrec.etc == etcNOSEHOOVER)
+ if (inputrec.etc == TemperatureCoupling::NoseHoover)
{
errorMessage += "Nose-Hoover temperature coupling is not supported.\n";
}
- if (!(inputrec.epc == epcNO || inputrec.epc == epcPARRINELLORAHMAN
- || inputrec.epc == epcBERENDSEN || inputrec.epc == epcCRESCALE))
+ if (!(inputrec.epc == PressureCoupling::No || inputrec.epc == PressureCoupling::ParrinelloRahman
+ || inputrec.epc == PressureCoupling::Berendsen || inputrec.epc == PressureCoupling::CRescale))
{
errorMessage +=
"Only Parrinello-Rahman, Berendsen, and C-rescale pressure coupling are "
// The graph is needed, but not supported
errorMessage += "Orientation restraints are not supported.\n";
}
- if (inputrec.efep != efepNO && (haveFepPerturbedMasses(mtop) || havePerturbedConstraints(mtop)))
+ if (inputrec.efep != FreeEnergyPerturbationType::No
+ && (haveFepPerturbedMasses(mtop) || havePerturbedConstraints(mtop)))
{
errorMessage += "Free energy perturbation for mass and constraints are not supported.\n";
}
const auto particleTypes = gmx_mtop_particletype_count(mtop);
- if (particleTypes[eptShell] > 0)
+ if (particleTypes[ParticleType::Shell] > 0)
{
errorMessage += "Shells are not supported.\n";
}
- if (useReplicaExchange)
- {
- errorMessage += "Replica exchange simulations are not supported.\n";
- }
- if (inputrec.eSwapCoords != eswapNO)
+ if (inputrec.eSwapCoords != SwapType::No)
{
errorMessage += "Swapping the coordinates is not supported.\n";
}
"The number of coupled constraints is higher than supported in the GPU LINCS "
"code.\n";
}
+ if (hasAnyConstraints && !UpdateConstrainGpu::areConstraintsSupported())
+ {
+ errorMessage += "Chosen GPU implementation does not support constraints.\n";
+ }
if (haveFrozenAtoms)
{
// There is a known bug with frozen atoms and GPU update, see Issue #3920.
bool enableGpuHaloExchange = false;
//! True if the PME PP direct communication GPU development feature is enabled
bool enableGpuPmePPComm = false;
+ //! True if the CUDA-aware MPI is being used for GPU direct communication feature
+ bool usingCudaAwareMpi = false;
};
*
* \param[in] nonbondedTarget The user's choice for mdrun -nb for where to assign
* short-ranged nonbonded interaction tasks.
- * \param[in] numDevicesToUse Number of compatible GPUs that the user permitted
- * us to use.
+ * \param[in] haveAvailableDevices Whether there are available devices.
* \param[in] userGpuTaskAssignment The user-specified assignment of GPU tasks to device IDs.
* \param[in] emulateGpuNonbonded Whether we will emulate GPU calculation of nonbonded
* interactions.
* \throws std::bad_alloc If out of memory
* InconsistentInputError If the user requirements are inconsistent. */
bool decideWhetherToUseGpusForNonbondedWithThreadMpi(TaskTarget nonbondedTarget,
- int numDevicesToUse,
+ bool haveAvailableDevices,
const std::vector<int>& userGpuTaskAssignment,
EmulateGpuNonbonded emulateGpuNonbonded,
bool buildSupportsNonbondedOnGpu,
* \param[in] useGpuForNonbonded Whether GPUs will be used for nonbonded interactions.
* \param[in] pmeTarget The user's choice for mdrun -pme for where to assign
* long-ranged PME nonbonded interaction tasks.
+ * \param[in] pmeFftTarget The user's choice for mdrun -pmefft for where to run FFT.
* \param[in] numDevicesToUse The number of compatible GPUs that the user permitted us to use.
* \param[in] userGpuTaskAssignment The user-specified assignment of GPU tasks to device IDs.
* \param[in] hardwareInfo Hardware information
* InconsistentInputError If the user requirements are inconsistent. */
bool decideWhetherToUseGpusForPmeWithThreadMpi(bool useGpuForNonbonded,
TaskTarget pmeTarget,
+ TaskTarget pmeFftTarget,
int numDevicesToUse,
const std::vector<int>& userGpuTaskAssignment,
const gmx_hw_info_t& hardwareInfo,
*
* \param[in] useGpuForNonbonded Whether GPUs will be used for nonbonded interactions.
* \param[in] pmeTarget The user's choice for mdrun -pme for where to assign long-ranged PME nonbonded interaction tasks.
+ * \param[in] pmeFftTarget The user's choice for mdrun -pmefft for where to do FFT for PME.
* \param[in] userGpuTaskAssignment The user-specified assignment of GPU tasks to device IDs.
* \param[in] hardwareInfo Hardware information
* \param[in] inputrec The user input
* InconsistentInputError If the user requirements are inconsistent. */
bool decideWhetherToUseGpusForPme(bool useGpuForNonbonded,
TaskTarget pmeTarget,
+ TaskTarget pmeFftTarget,
const std::vector<int>& userGpuTaskAssignment,
const gmx_hw_info_t& hardwareInfo,
const t_inputrec& inputrec,
* \param[in] mtop The global topology.
* \param[in] useEssentialDynamics If essential dynamics is active.
* \param[in] doOrientationRestraints If orientation restraints are enabled.
- * \param[in] useReplicaExchange If this is a REMD simulation.
* \param[in] haveFrozenAtoms If this simulation has frozen atoms (see Issue #3920).
* \param[in] doRerun It this is a rerun.
* \param[in] devFlags GPU development / experimental feature flags.
const gmx_mtop_t& mtop,
bool useEssentialDynamics,
bool doOrientationRestraints,
- bool useReplicaExchange,
bool haveFrozenAtoms,
bool doRerun,
const DevelopmentFeatureFlags& devFlags,