Extend task assignment code
authorMark Abraham <mark.j.abraham@gmail.com>
Wed, 23 Aug 2017 14:06:44 +0000 (16:06 +0200)
committerMark Abraham <mark.j.abraham@gmail.com>
Fri, 24 Nov 2017 13:49:52 +0000 (14:49 +0100)
commite87a5310c36130ac1d26867f8f7e597dcd9fd513
treefd92ea86de0df8d02e6f884ecef7d09d6c8958b2
parent8a2a2e6f33e8c9b8d85a161db478a64793183094
Extend task assignment code

Existing behaviour is largely unchanged, apart from some details of
how conditions that prevent task assignment are handled, and when.

However it is not feasible in the longer term to continue to implement
a way for gmx mdrun -gpu_id to imply the thread-MPI rank split, so
that is disabled now, along with a useful error message. Instead, for
both real and thread MPI, -gpu_id now limits the available GPU IDs
(issuing an error if there are any duplicates), somewhat like
CUDA_VISIBLE_DEVICES. The new mdrun -gputasks option specifies a full
GPU task assignment, and must be accompanied by a choice of ranks and
what kind of device recevies tasks of each type. Documentation is
updated accordingly.

Aspects of the implementation anticipate the extension to support
long-ranged PME interactions on GPUs, and others in future, so that
the task assignment on a node now takes the form of a container of
tasks, potentially of different types, on each rank of the node. A
flat vector of ints is no longer sufficient.

Errors e.g. from inconsistent user input are now handled with
exceptions, so that the runner can take the responsibility of
reporting those correctly, rather than always aborting the program at
the point where the issue is detected.

gmx tune_pme now explicitly only supports the new form of -gpu_id,
though it would not be difficult to support -gputasks if there
was need.

Change-Id: I0c149913bd43418d374171f5f95dad7f25d3cfe4
23 files changed:
docs/user-guide/environment-variables.rst
docs/user-guide/mdrun-features.rst
docs/user-guide/mdrun-performance.rst
src/gromacs/ewald/tests/testhardwarecontexts.cpp
src/gromacs/gmxana/gmx_tune_pme.cpp
src/gromacs/hardware/detecthardware.cpp
src/gromacs/hardware/hw_info.h
src/gromacs/taskassignment/CMakeLists.txt
src/gromacs/taskassignment/decidegpuusage.cpp [new file with mode: 0644]
src/gromacs/taskassignment/decidegpuusage.h [new file with mode: 0644]
src/gromacs/taskassignment/findallgputasks.cpp [new file with mode: 0644]
src/gromacs/taskassignment/findallgputasks.h [new file with mode: 0644]
src/gromacs/taskassignment/hardwareassign.cpp [deleted file]
src/gromacs/taskassignment/hardwareassign.h [deleted file]
src/gromacs/taskassignment/reportgpuusage.cpp [new file with mode: 0644]
src/gromacs/taskassignment/reportgpuusage.h [new file with mode: 0644]
src/gromacs/taskassignment/resourcedivision.cpp
src/gromacs/taskassignment/resourcedivision.h
src/gromacs/taskassignment/taskassignment.cpp [new file with mode: 0644]
src/gromacs/taskassignment/taskassignment.h [new file with mode: 0644]
src/gromacs/taskassignment/usergpuids.h
src/programs/mdrun/mdrun.cpp
src/programs/mdrun/runner.cpp