Redesigned SIMD module and unit tests.
Second part of the commit (after renaming), which contains
the actual functionality changes. This new version
implements a complete interface layer that
is architecture-agnostic, while each architecture has a
separate implementation file with both single, double and
simd4 versions (that can be used simultaneously). simd.h
contains a number of defines that describe the capabilities
of the instruction set, and there is a new documentation
module for Doxygen. This will be used in a later patch
for modularized verlet kernels. With that, we hope to remove
all architecture-specific SIMD code from the rest of Gromacs.
All SIMD math functions have been redesigned so they work even
with instruction sets that do not support integers (and even
on sets that do not support logical operations), and
accuracy has been improved for double precision sincos() by
removing the table implementation. To try to reduce
the size of this relatively large patch I have kept a few
header files (in particular the math files in gromacs/simd)
to avoid touching all the group kernels. With this new kernel
module, 256-bit AVX2 SIMD acceleration will now automatically be
enabled for the verlet kernels. Group kernels will use AVX_256
in this case. Also incorporates changes from Teemu to make
static and gmx_inline library functions appear correctly in
the Doxygen documentation.
Relocation of nbnxn SIMD setup from nb_verlet.h to nbnxn_simd.h
in mdlib by Berk. Now the nbnxn SIMD setup is completely internal.
Replaced the pr4 functions in the nbnxn kernels by simd4.
The nbnxn kernel selection is now nearly architecture agnostic.
Also enabled FMA again for pmecorr SIMD functions in double.
Change-Id: I643da75f346f120500682bcc4bcc1333a635db70