Fix DD load balancing bug with GPU sharing
authorSzilárd Páll <pall.szilard@gmail.com>
Tue, 19 Nov 2013 02:00:24 +0000 (03:00 +0100)
committerGerrit Code Review <gerrit@gerrit.gromacs.org>
Tue, 19 Nov 2013 15:18:03 +0000 (16:18 +0100)
The recent DD load balancing fix which solved the issue of incorrect
imbalance measure with GPU sharing (ba8232e9) addressed GPUs with
incorrect indexing. This caused out of bounds indexing in the GPU ID
query function. The query function also had a bug in the error checking
which allowed the incorrect indexing.
Now also mdrun -nb cpu -gpu_id ... is allowed, which before would give
a fatal error.

This commit addresses both issues; fixes #1385

Change-Id: I2800f610b873da92afe78bbfd869258f378ba2d7

src/gmxlib/gpu_utils/gpu_utils.cu
src/kernel/runner.c
src/mdlib/domdec.c

index ee3d5e10d6a99e6d9fa700edc67cd0ac8d05b030..24fc7557a6a261c52c415fbd1d5e0e63bf2474ad 100644 (file)
@@ -860,10 +860,7 @@ int get_gpu_device_id(const gmx_gpu_info_t *gpu_info,
 {
     assert(gpu_info);
     assert(gpu_opt);
-    if (idx < 0 && idx >= gpu_opt->ncuda_dev_use)
-    {
-        return -1;
-    }
+    assert(idx >= 0 && idx < gpu_opt->ncuda_dev_use);
 
     return gpu_info->cuda_dev[gpu_opt->cuda_dev_use[idx]].id;
 }
index 17b5f351ad044b5756ad317285078c0d1c55671c..68ea884a0b603c2d0a3bfbff6fd7442ac37b51b6 100644 (file)
@@ -1481,6 +1481,11 @@ int mdrunner(gmx_hw_opt_t *hw_opt,
         gmx_select_gpu_ids(fplog, cr, &hwinfo->gpu_info, bForceUseGPU,
                            &hw_opt->gpu_opt);
     }
+    else
+    {
+        /* Ignore (potentially) manually selected GPUs */
+        hw_opt->gpu_opt.ncuda_dev_use = 0;
+    }
 
     /* check consistency of CPU acceleration and number of GPUs selected */
     gmx_check_hw_runconf_consistency(fplog, hwinfo, cr, hw_opt, bUseGPU);
index d488b0bdf16840133bdf9ddecf4d28a6a9352969..92fa8c16406004ebb400b01a8d846e7590d449ff 100644 (file)
@@ -5697,7 +5697,7 @@ void dd_setup_dlb_resource_sharing(t_commrec *cr,
 
     physicalnode_id_hash = gmx_physicalnode_id_hash();
 
-    gpu_id = get_gpu_device_id(&hwinfo->gpu_info, &hw_opt->gpu_opt, cr->nodeid);
+    gpu_id = get_gpu_device_id(&hwinfo->gpu_info, &hw_opt->gpu_opt, cr->rank_pp_intranode);
 
     dd = cr->dd;