1. KNOWN LIMITATIONS
=================
-- Sharing an OpenCL GPU between two MPI ranks is not supported.
- See also Issue #91 - https://github.com/StreamComputing/gromacs/issues/91
-
-- Using more than one OpenCL GPU on a node is not known to work in all cases.
+- Currently there are no known limitations.
2. CODE IMPROVEMENTS
=================
file and shared between the host and the device.
See also Issue #16 - https://github.com/StreamComputing/gromacs/issues/16
-- Generating binary cache has a potential race condition in Multiple GPU runs
- See also Issue #71 - https://github.com/StreamComputing/gromacs/issues/71
-
-- Caching for OpenCL builds should detect when a rebuild is necessary
- See also Issue #72 - https://github.com/StreamComputing/gromacs/issues/72
-
- Quite a few error conditions are unhandled, noted with TODOs in several files
-- gmx_device_info_t needs struct field documentation
-
3. ENHANCEMENTS
============
- Implement OpenCL kernels for Intel GPUs
- Have one OpenCL program object per OpenCL kernel
See also Issue #86 - https://github.com/StreamComputing/gromacs/issues/86
+- Consider parallelising JIT of programs over CPU cores to improve startup
+ time
+
+- Re-consider caching JIT artefacts to improve startup time
+
4. OPTIMIZATIONS
=============
- Defining nbparam fields as constants when building the OpenCL kernels
- Unlike the CUDA version, the OpenCL implementation uses normal buffers
instead of textures
See also Issue #88 - https://github.com/StreamComputing/gromacs/issues/88
-
-6. TESTED CONFIGURATIONS
- =====================
-Tested devices:
- NVIDIA GPUs: GeForce GTX 660M, GeForce GTX 750Ti, GeForce GTX 780
- AMD GPUs: FirePro W5100, HD 7950, FirePro W9100, Radeon R7 M260, R9 290
-
-Tested kernels:
-Kernel |Benchmark test |Remarks
---------------------------------------------------------------------------------------------------------
-nbnxn_kernel_ElecCut_VdwLJ_VF_prune_opencl |d.poly-ch2 |
-nbnxn_kernel_ElecCut_VdwLJ_F_opencl |d.poly-ch2 |
-nbnxn_kernel_ElecCut_VdwLJ_F_prune_opencl |d.poly-ch2 |
-nbnxn_kernel_ElecCut_VdwLJ_VF_opencl |d.poly-ch2 |
-nbnxn_kernel_ElecRF_VdwLJ_VF_prune_opencl |adh_cubic with rf_verlet.mdp |
-nbnxn_kernel_ElecRF_VdwLJ_F_opencl |adh_cubic with rf_verlet.mdp |
-nbnxn_kernel_ElecRF_VdwLJ_F_prune_opencl |adh_cubic with rf_verlet.mdp |
-nbnxn_kernel_ElecEwQSTab_VdwLJ_VF_prune_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |Failed
-nbnxn_kernel_ElecEwQSTab_VdwLJ_F_prune_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |Failed
-nbnxn_kernel_ElecEw_VdwLJ_VF_prune_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |
-nbnxn_kernel_ElecEw_VdwLJ_F_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |
-nbnxn_kernel_ElecEw_VdwLJ_F_prune_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |
-nbnxn_kernel_ElecEwTwinCut_VdwLJ_F_prune_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |
-nbnxn_kernel_ElecEwTwinCut_VdwLJ_F_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |
-
-Input data used for testing - Benchmark data sets available here:
-ftp://ftp.gromacs.org/pub/benchmarks
-