BioD PNPI Git Repos - alexxy/gromacs.git/commit

author	Szilard Pall <pall.szilard@gmail.com>
	Wed, 24 Jun 2015 21:19:46 +0000 (23:19 +0200)
committer	Gerrit Code Review <gerrit@gerrit.gromacs.org>
	Sat, 27 Jun 2015 22:27:19 +0000 (00:27 +0200)
commit	adbada47acec11923abb49678aae1a5437c9322b
tree	0025b7fc815afa866210a31d511a90ddef22cbd1	tree \| snapshot
parent	a34ffa2558c24b86edad515d43c82fca3c07e3ce	commit \| diff

Fix CUDA architecture dependent issues

Only device code gets generated in multiple passes and therefore
target architecture-dependent macros like __CUDA_ARCH__ or our own
IATYPE_SHMEM (which also depends on __CUDA_ARCH__) are not usable in
host code as these will be both undefined. As a result, current code
over-allocated dynamic shared memory. This has no negative side-effect.
This change replaces the use of macros with runtime device compute
capability checks. Also texture objects are now actually enabled,
which give very minor performance improvements.
Note that on Maxwell + CUDA 7.0 there is a 20% performance regression
for the tabulated Ewald kernel (which is not used by default), which
magically disappears when texture references are used instead.

Change-Id: I1f911caad85eb38d6a8e95f3b3923561dbfccd0e

src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda.cu		diff \| blob \| history
src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda_data_mgmt.cu		diff \| blob \| history
src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda_kernel.cuh		diff \| blob \| history
src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda_kernel_utils.cuh		diff \| blob \| history