BlueGene/Q Verlet cut-off scheme kernels
The kernels are implemented with small functions whose inlining
is guaranteed by the use of xlc and clang extensions. That's a hack
whose general solution I plan to implement in master branch.
Other BG/Q considerations:
Architecture detection now works on A2 core.
Install guide updated.
It is better to use intra-node communicators than not, and ranks
within nodes are correctly detected via querying the BlueGene/Q API,
since the hostname is not useful for the purpose.
It is better to not set GMX_DD_SENDRECV2.
It is better to use the analytical Ewald correction.
In principle, we should version the type of variables and fields named
d2, rl2, rbb2 in nbnxn_search*[ch] to be double on PowerPC and float
everywhere else (each regardless of GROMACS target precision). This
would mean that on PowerPC (where all flops take place in double
precision with free precision-extension upon load) we can be both
cache-efficient by storing bounding boxes in float, and flop-efficient
by not having to generate a round-to-single instruction to compare the
result of subc_bb_dist2_simd4 with the cut-off stored as a
float. Still, a flop per bounding-box distance comparison will not
break the bank.
Enough bgclang support exists for the build to succeed (no platform
file is required), even with OpenMP, but a number of compiler issues
have been reported on llvm-bgq-discuss mailing list.
Change-Id: I98c5791ec3766cdbdcb8a8eb7418d00585727cc0
39 files changed: