Re-enable i-atom type local mem prefetch in OpenCL
For reasons unknown this has been disabled in the original OpenCL
implementation. However, it turns out that prefetching does have
substantial performance benefits, especially on AMD (>10%) and in some
cases on NVIDIA too (although not on Maxwell).
This change re-enables prefetching code-path and turns it on
for AMD devices. For NVIDIA the decision will be revisited later.
The GMX_OCL_ENABLE_I_PREFETCH/GMX_OCL_DISABLE_I_PREFETCH environment
variables allow testing prefetching with future architectures/compilers.
Change-Id: I8324d62d3d78e0a1577dd3125edf059d3b311c2f