Compiler
--------
Technically, |Gromacs| can be compiled on any platform with an ANSI C99
-and C++98 compiler, and their respective standard C/C++ libraries.
-We use only a few C99 features, but note that the C++ compiler also needs to
-support these C99 features (notably, int64_t and related things), which are not
-part of the C++98 standard.
+and C++11 compiler, and their respective standard C/C++ libraries.
+GROMACS uses a subset of C99 and C++11. A not fully standard compliant
+compiler might be able to compile GROMACS.
Getting good performance on an OS and architecture requires choosing a
good compiler. In practice, many compilers struggle to do a good job
optimizing the |Gromacs| architecture-optimized SIMD kernels.
+C++11 support requires both support in the compiler as well as in the
+C++ library. Multiple compilers do not provide their own library
+but use the system library. It is required to select a library with
+sufficient C++11 support. Both the Intel and clang compiler on Linux use
+the libstdc++ which comes with gcc as the default C++ library. 4.6.1 of
+that library is required. Also the C++ library version has to be
+supported by the compiler. To select the C++ library version use:
+
+* For Intel: ``CXXFLAGS=-gcc-name=/path/to/gcc/binary`` or make sure
+ that the correct gcc version is first in path (e.g. by loading the gcc
+ module)
+* For clang: ``CFLAGS=--gcc-toolchain=/path/to/gcc/folder
+ CXXFLAGS=--gcc-toolchain=/path/to/gcc/folder``. This folder should
+ contain ``include/c++``.
+* On Windows with e.g. Intel: at least MSVC 2013 is required. Load the
+ enviroment with vcvarsall.bat.
+
For best performance, the |Gromacs| team strongly recommends you get the
most recent version of your preferred compiler for your platform.
There is a large amount of |Gromacs| code that depends on effective
To make it possible to use other accelerators, |Gromacs| also includes
OpenCL_ support. The current version is recommended for use with
- GCN-based AMD GPUs. It does work with NVIDIA GPUs, but see the
- known limitations in the user guide. The minimum
- OpenCL version required is |REQUIRED_OPENCL_MIN_VERSION|.
+ GCN-based AMD GPUs. It does work with NVIDIA GPUs, but using the latest
+ NVIDIA driver (which includes the NVIDIA OpenCL runtime) is recommended,
+ and please see the known limitations in the |Gromacs| user guide. The
+ minimum OpenCL version required is |REQUIRED_OPENCL_MIN_VERSION|.
It is not possible to configure both CUDA and OpenCL support in the
same version of |Gromacs|.
-------------------------
* Compiling to run on NVIDIA GPUs requires CUDA_
* Compiling to run on AMD GPUs requires OpenCL_
-* An external Boost library can be used to provide better
- implementation support for smart pointers and exception handling,
- but the |Gromacs| source bundles a subset of Boost 1.55.0 as a fallback
* Hardware-optimized BLAS and LAPACK libraries are useful
for a few of the |Gromacs| utilities focused on normal modes and
matrix manipulation, but they do not provide any benefits for normal
it works because we have tested it. We do test on Linux, Windows, and
Mac with a range of compilers and libraries for a range of our
configuration options. Every commit in our git source code repository
-is currently tested on x86 with gcc versions ranging from 4.1 through
-5.1, and versions 12 through 15 of the Intel compiler as well as Clang
+is currently tested on x86 with gcc versions ranging from 4.6 through
+5.1, and versions 14 and 15 of the Intel compiler as well as Clang
version 3.4 through 3.6. For this, we use a variety of GNU/Linux
flavors and versions as well as recent versions of Mac OS X and Windows. Under
Windows we test both MSVC and the Intel compiler. For details, you can
--- /dev/null
+/*
+ * This file is part of the GROMACS molecular simulation package.
+ *
+ * Copyright (c) 2014,2015, by the GROMACS development team, led by
+ * Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
+ * and including many others, as listed in the AUTHORS file in the
+ * top-level source directory and at http://www.gromacs.org.
+ *
+ * GROMACS is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1
+ * of the License, or (at your option) any later version.
+ *
+ * GROMACS is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with GROMACS; if not, see
+ * http://www.gnu.org/licenses, or write to the Free Software Foundation,
+ * Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * If you want to redistribute modifications to GROMACS, please
+ * consider that scientific software is very special. Version
+ * control is crucial - bugs must be traceable. We will be happy to
+ * consider code for inclusion in the official distribution, but
+ * derived work must not be called official GROMACS. Details are found
+ * in the README & COPYING files - if they are missing, get the
+ * official version at http://www.gromacs.org.
+ *
+ * To help us fund GROMACS development, we humbly ask that you cite
+ * the research papers on the package. Check out http://www.gromacs.org.
+ */
+/*! \internal \file
+ * \brief Define utility routines for OpenCL
+ *
+ * \author Anca Hamuraru <anca@streamcomputing.eu>
+ */
+#include "gmxpre.h"
+
+#include "oclutils.h"
+
+#include <stdlib.h>
+
+#include <cassert>
+#include <cstdio>
+
+#include "gromacs/utility/fatalerror.h"
+#include "gromacs/utility/smalloc.h"
+
+/*! \brief Launches synchronous or asynchronous host to device memory copy.
+ *
+ * If copy_event is not NULL, on return it will contain an event object
+ * identifying this particular host to device operation. The event can further
+ * be used to queue a wait for this operation or to query profiling information.
+ */
+static int ocl_copy_H2D_generic(cl_mem d_dest, void* h_src,
+ size_t offset, size_t bytes,
+ bool bAsync /* = false*/,
+ cl_command_queue command_queue,
+ cl_event *copy_event)
+{
+ cl_int gmx_unused cl_error;
+
+ if (d_dest == NULL || h_src == NULL || bytes == 0)
+ {
+ return -1;
+ }
+
+ if (bAsync)
+ {
+ cl_error = clEnqueueWriteBuffer(command_queue, d_dest, CL_FALSE, offset, bytes, h_src, 0, NULL, copy_event);
+ assert(cl_error == CL_SUCCESS);
+ // TODO: handle errors
+ }
+ else
+ {
+ cl_error = clEnqueueWriteBuffer(command_queue, d_dest, CL_TRUE, offset, bytes, h_src, 0, NULL, copy_event);
+ assert(cl_error == CL_SUCCESS);
+ // TODO: handle errors
+ }
+
+ return 0;
+}
+
+/*! \brief Launches asynchronous host to device memory copy.
+ *
+ * If copy_event is not NULL, on return it will contain an event object
+ * identifying this particular host to device operation. The event can further
+ * be used to queue a wait for this operation or to query profiling information.
+ */
+int ocl_copy_H2D_async(cl_mem d_dest, void * h_src,
+ size_t offset, size_t bytes,
+ cl_command_queue command_queue,
+ cl_event *copy_event)
+{
+ return ocl_copy_H2D_generic(d_dest, h_src, offset, bytes, true, command_queue, copy_event);
+}
+
+/*! \brief Launches synchronous host to device memory copy.
+ */
+int ocl_copy_H2D(cl_mem d_dest, void * h_src,
+ size_t offset, size_t bytes,
+ cl_command_queue command_queue)
+{
+ return ocl_copy_H2D_generic(d_dest, h_src, offset, bytes, false, command_queue, NULL);
+}
+
+/*! \brief Launches synchronous or asynchronous device to host memory copy.
+ *
+ * If copy_event is not NULL, on return it will contain an event object
+ * identifying this particular device to host operation. The event can further
+ * be used to queue a wait for this operation or to query profiling information.
+ */
+int ocl_copy_D2H_generic(void * h_dest, cl_mem d_src,
+ size_t offset, size_t bytes,
+ bool bAsync,
+ cl_command_queue command_queue,
+ cl_event *copy_event)
+{
+ cl_int gmx_unused cl_error;
+
+ if (h_dest == NULL || d_src == NULL || bytes == 0)
+ {
+ return -1;
+ }
+
+ if (bAsync)
+ {
+ cl_error = clEnqueueReadBuffer(command_queue, d_src, CL_FALSE, offset, bytes, h_dest, 0, NULL, copy_event);
+ assert(cl_error == CL_SUCCESS);
+ // TODO: handle errors
+ }
+ else
+ {
+ cl_error = clEnqueueReadBuffer(command_queue, d_src, CL_TRUE, offset, bytes, h_dest, 0, NULL, copy_event);
+ assert(cl_error == CL_SUCCESS);
+ // TODO: handle errors
+ }
+
+ return 0;
+}
+
+/*! \brief Launches asynchronous device to host memory copy.
+ *
+ * If copy_event is not NULL, on return it will contain an event object
+ * identifying this particular host to device operation. The event can further
+ * be used to queue a wait for this operation or to query profiling information.
+ */
+int ocl_copy_D2H_async(void * h_dest, cl_mem d_src,
+ size_t offset, size_t bytes,
+ cl_command_queue command_queue,
+ cl_event *copy_event)
+{
+ return ocl_copy_D2H_generic(h_dest, d_src, offset, bytes, true, command_queue, copy_event);
+}
+
+/*! \brief \brief Allocates nbytes of host memory. Use ocl_free to free memory allocated with this function.
+ *
+ * \todo
+ * This function should allocate page-locked memory to help reduce D2H and H2D
+ * transfer times, similar with pmalloc from pmalloc_cuda.cu.
+ *
+ * \param[in,out] h_ptr Pointer where to store the address of the newly allocated buffer.
+ * \param[in] nbytes Size in bytes of the buffer to be allocated.
+ */
+void ocl_pmalloc(void **h_ptr, size_t nbytes)
+{
+ /* Need a temporary type whose size is 1 byte, so that the
+ * implementation of snew_aligned can cope without issuing
+ * warnings. */
+ char **temporary = reinterpret_cast<char **>(h_ptr);
+
+ /* 16-byte alignment is required by the neighbour-searching code,
+ * because it uses four-wide SIMD for bounding-box calculation.
+ * However, when we organize using page-locked memory for
+ * device-host transfers, it will probably need to be aligned to a
+ * 4kb page, like CUDA does. */
+ snew_aligned(*temporary, nbytes, 16);
+}
+
+/*! \brief Frees memory allocated with ocl_pmalloc.
+ *
+ * \param[in] h_ptr Buffer allocated with ocl_pmalloc that needs to be freed.
+ */
+void ocl_pfree(void *h_ptr)
+{
+
+ if (h_ptr)
+ {
+ sfree_aligned(h_ptr);
+ }
+ return;
+}
++
++/*! \brief Convert error code to diagnostic string */
++const char *ocl_get_error_string(cl_int error)
++{
++ switch (error)
++ {
++ // run-time and JIT compiler errors
++ case 0: return "CL_SUCCESS";
++ case -1: return "CL_DEVICE_NOT_FOUND";
++ case -2: return "CL_DEVICE_NOT_AVAILABLE";
++ case -3: return "CL_COMPILER_NOT_AVAILABLE";
++ case -4: return "CL_MEM_OBJECT_ALLOCATION_FAILURE";
++ case -5: return "CL_OUT_OF_RESOURCES";
++ case -6: return "CL_OUT_OF_HOST_MEMORY";
++ case -7: return "CL_PROFILING_INFO_NOT_AVAILABLE";
++ case -8: return "CL_MEM_COPY_OVERLAP";
++ case -9: return "CL_IMAGE_FORMAT_MISMATCH";
++ case -10: return "CL_IMAGE_FORMAT_NOT_SUPPORTED";
++ case -11: return "CL_BUILD_PROGRAM_FAILURE";
++ case -12: return "CL_MAP_FAILURE";
++ case -13: return "CL_MISALIGNED_SUB_BUFFER_OFFSET";
++ case -14: return "CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST";
++ case -15: return "CL_COMPILE_PROGRAM_FAILURE";
++ case -16: return "CL_LINKER_NOT_AVAILABLE";
++ case -17: return "CL_LINK_PROGRAM_FAILURE";
++ case -18: return "CL_DEVICE_PARTITION_FAILED";
++ case -19: return "CL_KERNEL_ARG_INFO_NOT_AVAILABLE";
++
++ // compile-time errors
++ case -30: return "CL_INVALID_VALUE";
++ case -31: return "CL_INVALID_DEVICE_TYPE";
++ case -32: return "CL_INVALID_PLATFORM";
++ case -33: return "CL_INVALID_DEVICE";
++ case -34: return "CL_INVALID_CONTEXT";
++ case -35: return "CL_INVALID_QUEUE_PROPERTIES";
++ case -36: return "CL_INVALID_COMMAND_QUEUE";
++ case -37: return "CL_INVALID_HOST_PTR";
++ case -38: return "CL_INVALID_MEM_OBJECT";
++ case -39: return "CL_INVALID_IMAGE_FORMAT_DESCRIPTOR";
++ case -40: return "CL_INVALID_IMAGE_SIZE";
++ case -41: return "CL_INVALID_SAMPLER";
++ case -42: return "CL_INVALID_BINARY";
++ case -43: return "CL_INVALID_BUILD_OPTIONS";
++ case -44: return "CL_INVALID_PROGRAM";
++ case -45: return "CL_INVALID_PROGRAM_EXECUTABLE";
++ case -46: return "CL_INVALID_KERNEL_NAME";
++ case -47: return "CL_INVALID_KERNEL_DEFINITION";
++ case -48: return "CL_INVALID_KERNEL";
++ case -49: return "CL_INVALID_ARG_INDEX";
++ case -50: return "CL_INVALID_ARG_VALUE";
++ case -51: return "CL_INVALID_ARG_SIZE";
++ case -52: return "CL_INVALID_KERNEL_ARGS";
++ case -53: return "CL_INVALID_WORK_DIMENSION";
++ case -54: return "CL_INVALID_WORK_GROUP_SIZE";
++ case -55: return "CL_INVALID_WORK_ITEM_SIZE";
++ case -56: return "CL_INVALID_GLOBAL_OFFSET";
++ case -57: return "CL_INVALID_EVENT_WAIT_LIST";
++ case -58: return "CL_INVALID_EVENT";
++ case -59: return "CL_INVALID_OPERATION";
++ case -60: return "CL_INVALID_GL_OBJECT";
++ case -61: return "CL_INVALID_BUFFER_SIZE";
++ case -62: return "CL_INVALID_MIP_LEVEL";
++ case -63: return "CL_INVALID_GLOBAL_WORK_SIZE";
++ case -64: return "CL_INVALID_PROPERTY";
++ case -65: return "CL_INVALID_IMAGE_DESCRIPTOR";
++ case -66: return "CL_INVALID_COMPILER_OPTIONS";
++ case -67: return "CL_INVALID_LINKER_OPTIONS";
++ case -68: return "CL_INVALID_DEVICE_PARTITION_COUNT";
++
++ // extension errors
++ case -1000: return "CL_INVALID_GL_SHAREGROUP_REFERENCE_KHR";
++ case -1001: return "CL_PLATFORM_NOT_FOUND_KHR";
++ case -1002: return "CL_INVALID_D3D10_DEVICE_KHR";
++ case -1003: return "CL_INVALID_D3D10_RESOURCE_KHR";
++ case -1004: return "CL_D3D10_RESOURCE_ALREADY_ACQUIRED_KHR";
++ case -1005: return "CL_D3D10_RESOURCE_NOT_ACQUIRED_KHR";
++ default: return "Unknown OpenCL error";
++ }
++}
--- /dev/null
+/*
+ * This file is part of the GROMACS molecular simulation package.
+ *
+ * Copyright (c) 2014,2015, by the GROMACS development team, led by
+ * Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
+ * and including many others, as listed in the AUTHORS file in the
+ * top-level source directory and at http://www.gromacs.org.
+ *
+ * GROMACS is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1
+ * of the License, or (at your option) any later version.
+ *
+ * GROMACS is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with GROMACS; if not, see
+ * http://www.gnu.org/licenses, or write to the Free Software Foundation,
+ * Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * If you want to redistribute modifications to GROMACS, please
+ * consider that scientific software is very special. Version
+ * control is crucial - bugs must be traceable. We will be happy to
+ * consider code for inclusion in the official distribution, but
+ * derived work must not be called official GROMACS. Details are found
+ * in the README & COPYING files - if they are missing, get the
+ * official version at http://www.gromacs.org.
+ *
+ * To help us fund GROMACS development, we humbly ask that you cite
+ * the research papers on the package. Check out http://www.gromacs.org.
+ */
+/*! \libinternal \file
+ * \brief Declare utility routines for OpenCL
+ *
+ * \author Anca Hamuraru <anca@streamcomputing.eu>
+ * \inlibraryapi
+ */
+#ifndef GMX_GPU_UTILS_OCLUTILS_H
+#define GMX_GPU_UTILS_OCLUTILS_H
+
+/*! \brief Declare to OpenCL SDKs that we intend to use OpenCL API
+ features that were deprecated in 2.0, so that they don't warn about
+ it. */
+#define CL_USE_DEPRECATED_OPENCL_2_0_APIS
+#ifdef __APPLE__
+# include <OpenCL/opencl.h>
+#else
+# include <CL/opencl.h>
+#endif
+
+/*! \brief OpenCL vendor IDs */
+typedef enum {
+ OCL_VENDOR_NVIDIA = 0,
+ OCL_VENDOR_AMD,
+ OCL_VENDOR_INTEL,
+ OCL_VENDOR_UNKNOWN
+} ocl_vendor_id_t;
+
+/*! \internal \brief OpenCL GPU device identificator
+ * An OpenCL device is identified by its ID.
+ * The platform ID is also included for caching reasons.
+ */
+typedef struct
+{
+ cl_platform_id ocl_platform_id; /**< Platform ID */
+ cl_device_id ocl_device_id; /**< Device ID */
+} ocl_gpu_id_t;
+
+/*! \internal \brief OpenCL GPU information
+ *
+ * \todo Move context and program outside this data structure.
+ * They are specific to a certain usage of the device (e.g. with/without OpenGL
+ * interop) and do not provide general device information as the data structure
+ * name indicates.
+ *
+ * TODO Document fields
+ */
+struct gmx_device_info_t
+{
+ //! @cond Doxygen_Suppress
+ ocl_gpu_id_t ocl_gpu_id;
+ char device_name[256];
+ char device_version[256];
+ char device_vendor[256];
+ int compute_units;
+ int adress_bits;
+ int stat;
+ ocl_vendor_id_t vendor_e;
+
+ cl_context context;
+ cl_program program;
+ //! @endcond Doxygen_Suppress
+
+};
+
+#if !defined(NDEBUG)
+/* Debugger callable function that prints the name of a kernel function pointer */
+cl_int dbg_ocl_kernel_name(const cl_kernel kernel);
+cl_int dbg_ocl_kernel_name_address(void* kernel);
+#endif
+
+
+/*! \brief Launches asynchronous host to device memory copy. */
+int ocl_copy_H2D_async(cl_mem d_dest, void * h_src,
+ size_t offset, size_t bytes,
+ cl_command_queue command_queue,
+ cl_event *copy_event);
+
+/*! \brief Launches asynchronous device to host memory copy. */
+int ocl_copy_D2H_async(void * h_dest, cl_mem d_src,
+ size_t offset, size_t bytes,
+ cl_command_queue command_queue,
+ cl_event *copy_event);
+
+/*! \brief Launches synchronous host to device memory copy. */
+int ocl_copy_H2D(cl_mem d_dest, void * h_src,
+ size_t offset, size_t bytes,
+ cl_command_queue command_queue);
+
+/*! \brief Allocate host memory in malloc style */
+void ocl_pmalloc(void **h_ptr, size_t nbytes);
+
+/*! \brief Free host memory in malloc style */
+void ocl_pfree(void *h_ptr);
+
++/*! \brief Convert error code to diagnostic string */
++const char *ocl_get_error_string(cl_int error);
++
+#endif
#include <limits>
#endif
-#include "gromacs/gmxlib/ocl_tools/oclutils.h"
-#include "gromacs/legacyheaders/types/force_flags.h"
-#include "gromacs/legacyheaders/types/hw_info.h"
-#include "gromacs/legacyheaders/types/simple.h"
+#include "gromacs/gpu_utils/oclutils.h"
+#include "gromacs/hardware/hw_info.h"
+#include "gromacs/mdlib/force_flags.h"
#include "gromacs/mdlib/nb_verlet.h"
#include "gromacs/mdlib/nbnxn_consts.h"
#include "gromacs/mdlib/nbnxn_pairlist.h"
#include "gromacs/pbcutil/ishift.h"
#include "gromacs/utility/cstringutil.h"
#include "gromacs/utility/fatalerror.h"
+ #include "gromacs/utility/gmxassert.h"
#include "nbnxn_ocl_types.h"
/* size of shmem (force-buffers/xq/atom type preloading) */
/* NOTE: with the default kernel on sm3.0 we need shmem only for pre-loading */
/* i-atom x+q in shared memory */
- //shmem = NCL_PER_SUPERCL * CL_SIZE * sizeof(float4);
shmem = NCL_PER_SUPERCL * CL_SIZE * sizeof(float) * 4; /* xqib */
/* cj in shared memory, for both warps separately */
shmem += 2 * NBNXN_GPU_JGROUP_SIZE * sizeof(int); /* cjs */
-#ifdef IATYPE_SHMEM // CUDA ARCH >= 300
+#ifdef IATYPE_SHMEM
+ /* FIXME: this should not be compile-time decided but rather at runtime.
+ * This issue propagated from the CUDA code where due to the source to source
+ * compilation there was confusion the way to set up arch-dependent launch parameters.
+ * Here too this should be converted to a hardware/arch/generation dependent
+ * conditional when re-evaluating the need for i atom type preloading.
+ */
/* i-atom types in shared memory */
#pragma error "Should not be defined"
shmem += NCL_PER_SUPERCL * CL_SIZE * sizeof(int); /* atib */
cl_error = clEnqueueWaitForEvents(stream, 1, ocl_event);
#endif
- assert(CL_SUCCESS == cl_error);
+ GMX_RELEASE_ASSERT(CL_SUCCESS == cl_error, ocl_get_error_string(cl_error));
/* Release event and reset it to 0. It is ok to release it as enqueuewaitforevents performs implicit retain for events. */
cl_error = clReleaseEvent(*ocl_event);
*ocl_event = 0;
}
-/*! \brief Returns the duration in miliseconds for the command associated with the event.
+/*! \brief Returns the duration in milliseconds for the command associated with the event.
*
* It then releases the event and sets it to 0.
* Before calling this function, make sure the command has finished either by
/* don't launch non-local copy-back if there was no non-local work to do */
if (iloc == eintNonlocal && nb->plist[iloc]->nsci == 0)
{
+ /* TODO An alternative way to signal that non-local work is
+ complete is to use a clEnqueueMarker+clEnqueueBarrier
+ pair. However, the use of bNonLocalStreamActive has the
+ advantage of being local to the host, so probably minimizes
+ overhead. Curiously, for NVIDIA OpenCL with an empty-domain
+ test case, overall simulation performance was higher with
+ the API calls, but this has not been tested on AMD OpenCL,
+ so could be worth considering in future. */
+ nb->bNonLocalStreamActive = false;
return;
}
/* With DD the local D2H transfer can only start after the non-local
has been launched. */
- if (iloc == eintLocal && nb->bUseTwoStreams)
+ if (iloc == eintLocal && nb->bNonLocalStreamActive)
{
sync_ocl_event(stream, &(nb->nonlocal_done));
}
cl_error = clEnqueueMarker(stream, &(nb->nonlocal_done));
#endif
assert(CL_SUCCESS == cl_error);
+ nb->bNonLocalStreamActive = true;
}
/* only transfer energies in the local stream */
* transfers to finish.
*/
void nbnxn_gpu_wait_for_gpu(gmx_nbnxn_ocl_t *nb,
- const nbnxn_atomdata_t gmx_unused *nbatom,
int flags, int aloc,
real *e_lj, real *e_el, rvec *fshift)
{
"requested through environment variables.");
}
- /* CUDA: By default, on SM 3.0 and later use analytical Ewald, on earlier tabulated. */
- /* OpenCL: By default, use analytical Ewald, on earlier tabulated. */
- // TODO: decide if dev_info parameter should be added to recognize NVIDIA CC>=3.0 devices.
+ /* OpenCL: By default, use analytical Ewald
+ * TODO: tabulated does not work, it needs fixing, see init_nbparam() in nbnxn_ocl_data_mgmt.cpp
+ *
+ * TODO: decide if dev_info parameter should be added to recognize NVIDIA CC>=3.0 devices.
+ *
+ */
//if ((dev_info->prop.major >= 3 || bForceAnalyticalEwald) && !bForceTabulatedEwald)
if ((1 || bForceAnalyticalEwald) && !bForceTabulatedEwald)
{
# include <CL/opencl.h>
#endif
-#include "gromacs/legacyheaders/types/interaction_const.h"
#include "gromacs/mdlib/nbnxn_pairlist.h"
+#include "gromacs/mdtypes/interaction_const.h"
#include "gromacs/utility/real.h"
/* kernel does #include "gromacs/math/utilities.h" */
cl_kernel kernel_zero_e_fshift;
///@}
- cl_bool bUseTwoStreams; /**< true if doing both local/non-local NB work on GPU */
+ cl_bool bUseTwoStreams; /**< true if doing both local/non-local NB work on GPU */
+ cl_bool bNonLocalStreamActive; /**< true indicates that the nonlocal_done event was enqueued */
- cl_atomdata_t *atdat; /**< atom data */
- cl_nbparam_t *nbparam; /**< parameters required for the non-bonded calc. */
- cl_plist_t *plist[2]; /**< pair-list data structures (local and non-local) */
- cl_nb_staging_t nbst; /**< staging area where fshift/energies get downloaded */
+ cl_atomdata_t *atdat; /**< atom data */
+ cl_nbparam_t *nbparam; /**< parameters required for the non-bonded calc. */
+ cl_plist_t *plist[2]; /**< pair-list data structures (local and non-local) */
+ cl_nb_staging_t nbst; /**< staging area where fshift/energies get downloaded */
- cl_mem debug_buffer; /**< debug buffer */
+ cl_mem debug_buffer; /**< debug buffer */
- cl_command_queue stream[2]; /**< local and non-local GPU queues */
+ cl_command_queue stream[2]; /**< local and non-local GPU queues */
/** events used for synchronization */
cl_event nonlocal_done; /**< event triggered when the non-local non-bonded kernel
#include "config.h"
+ #include <cstdio>
+
#include "gromacs/gmxpreprocess/grompp.h"
-#include "gromacs/legacyheaders/gmx_detect_hardware.h"
++#include "gromacs/hardware/detecthardware.h"
#include "gromacs/options/basicoptions.h"
-#include "gromacs/options/options.h"
+#include "gromacs/options/ioptionscontainer.h"
#include "gromacs/utility/basedefinitions.h"
#include "gromacs/utility/basenetwork.h"
-#include "gromacs/utility/file.h"
#include "gromacs/utility/gmxmpi.h"
+#include "gromacs/utility/textwriter.h"
#include "programs/mdrun/mdrun_main.h"
#include "testutils/cmdlinetest.h"
void
SimulationRunner::useStringAsMdpFile(const std::string &mdpString)
{
- gmx::File::writeFileFromString(mdpInputFileName_, mdpString);
+ gmx::TextWriter::writeFileFromString(mdpInputFileName_, mdpString);
}
void
SimulationRunner::useStringAsNdxFile(const char *ndxString)
{
- gmx::File::writeFileFromString(ndxFileName_, ndxString);
+ gmx::TextWriter::writeFileFromString(ndxFileName_, ndxString);
}
void
caller.addOption("-nsteps", nsteps_);
}
+#ifdef GMX_MPI
+# if GMX_GPU != GMX_GPU_NONE
+# ifdef GMX_THREAD_MPI
+ int numGpusNeeded = g_numThreads;
+# else /* Must be real MPI */
+ int numGpusNeeded = gmx_node_num();
+# endif
+ std::string gpuIdString(numGpusNeeded, '0');
+ caller.addOption("-gpu_id", gpuIdString.c_str());
+# endif
+#endif
+
#ifdef GMX_THREAD_MPI
- caller.addOption("-nt", g_numThreads);
+ caller.addOption("-ntmpi", g_numThreads);
#endif
#ifdef GMX_OPENMP
caller.addOption("-ntomp", g_numOpenMPThreads);
#endif
+ #if defined GMX_GPU
+ /* TODO Ideally, with real MPI, we could call
+ * gmx_collect_hardware_mpi() here and find out how many nodes
+ * mdrun will run on. For now, we assume that we're running on one
+ * node regardless of the number of ranks, because that's true in
+ * Jenkins and for most developers running the tests. */
+ int numberOfNodes = 1;
+ #if defined GMX_THREAD_MPI
+ /* Can't use gmx_node_num() because it is only valid after spawn of thread-MPI threads */
+ int numberOfRanks = g_numThreads;
+ #elif defined GMX_LIB_MPI
+ int numberOfRanks = gmx_node_num();
+ #else
+ int numberOfRanks = 1;
+ #endif
+ if (numberOfRanks > numberOfNodes && !gmx_multiple_gpu_per_node_supported())
+ {
+ if (gmx_node_rank() == 0)
+ {
+ fprintf(stderr, "GROMACS in this build configuration cannot run on more than one GPU per node,\n so with %d ranks and %d nodes, this test will disable GPU support", numberOfRanks, numberOfNodes);
+ }
+ caller.addOption("-nb", "cpu");
+ }
+ #endif
return gmx_mdrun(caller.argc(), caller.argv());
}