From: Szilard Pall Date: Fri, 11 Jan 2013 07:05:04 +0000 (+0100) Subject: time the GPU force clearing async launch overhead X-Git-Url: http://biod.pnpi.spb.ru/gitweb/?a=commitdiff_plain;h=381e088ffa7ea3d44f98df424bd0b63b831e1216;p=alexxy%2Fgromacs.git time the GPU force clearing async launch overhead Unfortunately, the asynchronous launch of GPU force buffer clearing can take up to 2% of the total run-time with short iteration times and many/fast cores/GPU. Timing it will at least remove it form the "Rest". Change-Id: I397c563ead24d87181de1b03879f164d1a97c2ca --- diff --git a/src/mdlib/sim_util.c b/src/mdlib/sim_util.c index eb9f636140..8f211715d9 100644 --- a/src/mdlib/sim_util.c +++ b/src/mdlib/sim_util.c @@ -1318,7 +1318,10 @@ void do_force_cutsVERLET(FILE *fplog,t_commrec *cr, wallcycle_stop(wcycle,ewcWAIT_GPU_NB_L); /* now clear the GPU outputs while we finish the step on the CPU */ + + wallcycle_start_nocount(wcycle,ewcLAUNCH_GPU_NB); nbnxn_cuda_clear_outputs(nbv->cu_nbv, flags); + wallcycle_stop(wcycle,ewcLAUNCH_GPU_NB); } else {