we always call the local kernel, the local x+q copy and later (not in
this function) the stream wait, local f copyback and the f buffer
clearing. All these operations, except for the local interaction kernel,
- are needed for the non-local interactions. */
+ are needed for the non-local interactions. The skip of the local kernel
+ call is taken care of later in this function. */
if (iloc == eintNonlocal && plist->nsci == 0)
{
return;
CU_RET_ERR(stat, "cudaEventRecord failed");
}
+ if (plist->nsci == 0)
+ {
+ /* Don't launch an empty local kernel (not allowed with CUDA) */
+ return;
+ }
+
/* beginning of timed nonbonded calculation section */
if (bDoTime)
{