This change adds by default the -cl-denorms-are-zero to the flags used
for kernel compilation. This is done to:
- avoid a large performance penalty on AMD Vega with ROCm (which by
default handles denorms on GFX9 or later).
- make the defaults uniform across CUDA and OpenCL.
Fixes #2593
Change-Id: I9e6183c4367b5960e0e21f1dd342d7695acfbc44
Miscellaneous
^^^^^^^^^^^^^
+
+Improve OpenCL kernel performance on AMD Vega GPUs
+""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
+The OpenCL kernel optimization flags did not explicitly turn off denorm handling
+which could lead to performance loss. The optimization is now explicitly turned
+on both for consistency with CUDA and performance reasons.
+On AMD Vega GPUs (with ROCm) kernel performance improves by up to 30%.
+
+
if (getenv("GMX_OCL_DISABLE_FASTMATH") == NULL)
{
compilerOptions += " -cl-fast-relaxed-math";
+
+ // Hint to the compiler that it can flush denorms to zero.
+ // In CUDA this is triggered by the -use_fast_math flag, equivalent with
+ // -cl-fast-relaxed-math, hence the inclusion on the conditional block.
+ compilerOptions += " -cl-denorms-are-zero";
}
if ((deviceVendorId == OCL_VENDOR_NVIDIA) && getenv("GMX_OCL_VERBOSE"))