From: Szilárd Páll Date: Mon, 30 Jul 2018 16:18:37 +0000 (+0200) Subject: Request flushing denorms to zero in OpenCL X-Git-Url: http://biod.pnpi.spb.ru/gitweb/?a=commitdiff_plain;h=c8096cb80eab8801a6a669c69a15f1f6b3a6c167;p=alexxy%2Fgromacs.git Request flushing denorms to zero in OpenCL This change adds by default the -cl-denorms-are-zero to the flags used for kernel compilation. This is done to: - avoid a large performance penalty on AMD Vega with ROCm (which by default handles denorms on GFX9 or later). - make the defaults uniform across CUDA and OpenCL. Fixes #2593 Change-Id: I9e6183c4367b5960e0e21f1dd342d7695acfbc44 --- diff --git a/docs/release-notes/2018/2018.3.rst b/docs/release-notes/2018/2018.3.rst index 58d8bef243..2bda2a55c8 100644 --- a/docs/release-notes/2018/2018.3.rst +++ b/docs/release-notes/2018/2018.3.rst @@ -82,3 +82,12 @@ Fixes to improve portability Miscellaneous ^^^^^^^^^^^^^ + +Improve OpenCL kernel performance on AMD Vega GPUs +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" +The OpenCL kernel optimization flags did not explicitly turn off denorm handling +which could lead to performance loss. The optimization is now explicitly turned +on both for consistency with CUDA and performance reasons. +On AMD Vega GPUs (with ROCm) kernel performance improves by up to 30%. + + diff --git a/src/gromacs/gpu_utils/ocl_compiler.cpp b/src/gromacs/gpu_utils/ocl_compiler.cpp index 957c6bc453..f54a94fa42 100644 --- a/src/gromacs/gpu_utils/ocl_compiler.cpp +++ b/src/gromacs/gpu_utils/ocl_compiler.cpp @@ -179,6 +179,11 @@ selectCompilerOptions(ocl_vendor_id_t deviceVendorId) if (getenv("GMX_OCL_DISABLE_FASTMATH") == NULL) { compilerOptions += " -cl-fast-relaxed-math"; + + // Hint to the compiler that it can flush denorms to zero. + // In CUDA this is triggered by the -use_fast_math flag, equivalent with + // -cl-fast-relaxed-math, hence the inclusion on the conditional block. + compilerOptions += " -cl-denorms-are-zero"; } if ((deviceVendorId == OCL_VENDOR_NVIDIA) && getenv("GMX_OCL_VERBOSE"))