WIth 1 PP + 1 PME rank the GpuBonded constructor gets passed the
non-local nonbonded stream which is nullptr and as a result the bonded
kernel launch happens in the default stream blocking concurrent
kernel execution.
This change makes sure that only when there is PP domain decomposition
is the GpuBonded constructor passed the nonlocal stream.
Fixes #3241
Change-Id: I858401b78c620adc3bea176e40e6fa179e583483
:issue:`3176`
Fix duplicate PDB CONECT record output
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+""""""""""""""""""""""""""""""""""""""
PDB "CONECT" record output was duplicated in some instances. Since |Gromacs| does
not use these anywhere, analysis was not affected. The behavior is now fixed.
:issue:`3206`
+Fix performance issue with bonded interactions in wrong GPU stream
+""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
+
+This could lead to a significant loss in performance.
+
+:issue:`3241`
if (useGpuForBonded)
{
- auto stream = DOMAINDECOMP(cr) ?
+ // TODO use havePPDomainDecomposition here to simplify the code.
+ auto stream = (DOMAINDECOMP(cr) && (cr->nnodes - cr->npmenodes > 1)) ?
nbnxn_gpu_get_command_stream(fr->nbv->gpu_nbv, eintNonlocal) :
nbnxn_gpu_get_command_stream(fr->nbv->gpu_nbv, eintLocal);
// TODO the heap allocation is only needed while