Set CUDA hardware constants for CC 7.5, 8.6
authorAndrey Alekseenko <al42and@gmail.com>
Wed, 7 Oct 2020 12:54:22 +0000 (14:54 +0200)
committerArtem Zhmurov <zhmurov@gmail.com>
Thu, 8 Oct 2020 09:40:37 +0000 (09:40 +0000)
The devices from CC 7.5 and 8.6 have lower limits on # of threads/blocks
per SM compare to all other CC 5.x+ architectures.

Source: CUDA Occupancy Calculator,
https://docs.nvidia.com/cuda/cuda-occupancy-calculator/CUDA_Occupancy_Calculator.xls,
accessed 2020-10-07.

src/gromacs/gpu_utils/cuda_arch_utils.cuh

index b3c4da9b1043ec22461b9243a0c97ea7c2213dd0..86b50cf3c6539b955f30269718726c317e745088 100644 (file)
@@ -93,9 +93,15 @@ static const bool c_disableCudaTextures = DISABLE_CUDA_TEXTURES;
 #    if GMX_PTX_ARCH <= 370 // CC 3.x
 #        define GMX_CUDA_MAX_BLOCKS_PER_MP 16
 #        define GMX_CUDA_MAX_THREADS_PER_MP 2048
-#    else // CC 5.x, 6.x
+#    elif GMX_PTX_ARCH == 750 // CC 7.5, lower limits compared to 7.0
+#        define GMX_CUDA_MAX_BLOCKS_PER_MP 16
+#        define GMX_CUDA_MAX_THREADS_PER_MP 1024
+#    elif GMX_PTX_ARCH == 860 // CC 8.6, lower limits compared to 8.0
+#        define GMX_CUDA_MAX_BLOCKS_PER_MP 16
+#        define GMX_CUDA_MAX_THREADS_PER_MP 1536
+#    else // CC 5.x, 6.x, 7.0, 8.0
 /* Note that this final branch covers all future architectures (current gen
- * is 6.x as of writing), hence assuming that these *currently defined* upper
+ * is 8.x as of writing), hence assuming that these *currently defined* upper
  * limits will not be lowered.
  */
 #        define GMX_CUDA_MAX_BLOCKS_PER_MP 32