BioD PNPI Git Repos - alexxy/gromacs.git/commit

author	Alan Gray <alangray3@gmail.com>
	Wed, 29 Sep 2021 14:10:58 +0000 (14:10 +0000)
committer	Andrey Alekseenko <al42and@gmail.com>
	Wed, 29 Sep 2021 14:10:58 +0000 (14:10 +0000)
commit	e0da8cce105ec43120c59a6d64611cc3f4f2c610
tree	7fc7d385dcfc8d18ec66ece154810384c26bd5c8	tree \| snapshot
parent	c1c4f2111e8e0e9256cf1c89ce9fa28d19ac82f0	commit \| diff

Avoid MPI sync for PME force sender GPU scheduling code and thread API calls

Replaces synchronous PME-PP MPI comms of event at every step with
exchange of event address and associated flag only on search
steps. The PP rank now ensures that event has been recorded before
enqueueing by spinning on flag written by PME rank in shared CPU
memory. This allows not only async progress by PME rank, but also
OpenMP parallelization of cudaMemcpy launches to the multiple PP
ranks, such that the CUDA API overheads will overlap.

Partly addresses #4047

src/gromacs/ewald/pme_force_sender_gpu_impl.cu		diff \| blob \| history
src/gromacs/ewald/pme_force_sender_gpu_impl.h		diff \| blob \| history
src/gromacs/ewald/pme_only.cpp		diff \| blob \| history
src/gromacs/ewald/pme_pp_comm_gpu_impl.cu		diff \| blob \| history
src/gromacs/ewald/pme_pp_comm_gpu_impl.h		diff \| blob \| history