128-bit AVX kernels, with FMA instructions for Opteron bulldozer
authorErik Lindahl <erik@kth.se>
Thu, 22 Nov 2012 22:18:05 +0000 (23:18 +0100)
committerGerrit Code Review <gerrit@gerrit.gromacs.org>
Wed, 28 Nov 2012 18:04:41 +0000 (19:04 +0100)
commit777838f3fb94d6193afa986afb468c1b4367dacd
treed44e2abf2cfb8f97e28b222c2e367442de7a402d
parent3b21fcb9e8c0bb7e149795f8b8c7d892ad4adf53
128-bit AVX kernels, with FMA instructions for Opteron bulldozer

These kernels primarily enable the FMA instructions
(fused multiply-add) available on modern AMD hardware,
but also use some other AMD-specific instructions and
optimization. Because of FMA availability , it is
slightly faster to use the analytical form of our Ewald
correction instead of a table. I have also corrected a
sign error in the comment (code was fine) of the analytical
PME correction.

Change-Id: Ief0fc0c2433e02ecea572c1e83b9a2493d73e853
123 files changed:
CMakeLists.txt
include/gmx_math_x86_avx_128_fma_single.h
include/gmx_math_x86_avx_256_double.h
include/gmx_math_x86_avx_256_single.h
include/gmx_math_x86_sse2_double.h
include/gmx_math_x86_sse2_single.h
include/gmx_math_x86_sse4_1_double.h
include/gmx_math_x86_sse4_1_single.h
include/gmx_x86_avx_128_fma.h
src/gmxlib/nonbonded/CMakeLists.txt
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/kernelutil_x86_avx_128_fma_single.h [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/make_nb_kernel_avx_128_fma_single.py [new file with mode: 0755]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwCSTab_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwCSTab_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwCSTab_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwCSTab_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwCSTab_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwLJ_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwLJ_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwLJ_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwLJ_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwLJ_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwNone_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwNone_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwNone_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwNone_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCSTab_VdwNone_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwCSTab_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwCSTab_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwCSTab_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwCSTab_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwCSTab_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwLJ_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwLJ_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwLJ_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwLJ_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwLJ_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwNone_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwNone_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwNone_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwNone_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecCoul_VdwNone_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSh_VdwLJSh_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSh_VdwLJSh_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSh_VdwLJSh_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSh_VdwLJSh_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSh_VdwLJSh_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSh_VdwNone_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSh_VdwNone_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSh_VdwNone_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSh_VdwNone_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSh_VdwNone_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSw_VdwLJSw_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSw_VdwLJSw_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSw_VdwLJSw_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSw_VdwLJSw_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSw_VdwLJSw_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSw_VdwNone_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSw_VdwNone_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSw_VdwNone_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSw_VdwNone_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEwSw_VdwNone_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwCSTab_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwCSTab_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwCSTab_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwCSTab_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwCSTab_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwLJ_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwLJ_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwLJ_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwLJ_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwLJ_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwNone_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwNone_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwNone_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwNone_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecEw_VdwNone_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecGB_VdwCSTab_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecGB_VdwLJ_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecGB_VdwNone_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecNone_VdwCSTab_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecNone_VdwLJSh_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecNone_VdwLJSw_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecNone_VdwLJ_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwCSTab_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwCSTab_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwCSTab_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwCSTab_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwCSTab_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwLJSh_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwLJSh_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwLJSh_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwLJSh_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwLJSh_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwLJSw_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwLJSw_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwLJSw_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwLJSw_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwLJSw_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwNone_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwNone_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwNone_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwNone_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRFCut_VdwNone_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwCSTab_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwCSTab_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwCSTab_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwCSTab_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwCSTab_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwLJ_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwLJ_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwLJ_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwLJ_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwLJ_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwNone_GeomP1P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwNone_GeomW3P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwNone_GeomW3W3_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwNone_GeomW4P1_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_ElecRF_VdwNone_GeomW4W4_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_avx_128_fma_single.c [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_avx_128_fma_single.h [new file with mode: 0644]
src/gmxlib/nonbonded/nb_kernel_avx_128_fma_single/nb_kernel_template_avx_128_fma_single.pre [new file with mode: 0644]
src/gmxlib/nonbonded/nonbonded.c