CUDA version of SETTLE algorithm with basic tests
CUDA-based GPU implementation of SETTLE. This is a part of
all-GPU loop. Can work isolated from other parts of the code
since coordinates are copied to (from) device before (after)
SETTLE kernel call. The velocity update as well as virial
evaluations can be enabled.
To enable, set GMX_SETTLE_GPU environment variable.
Limitations:
1. Does not work when domain decomposition is enabled.
2. Projection of the derivative is not implemented.
3. Not fully integrated/unified with the CPU version.
TODOs:
1. Multi-GPU case.
2. Better virial reduction. This is a more general feature,
not only related to constraints.
5. More cleanup in constr.cpp needed.
6. Better unit tests.
Refs #2816, #2886
Change-Id: I218e1bf1f86a2351e189e3c27f950f45c06135a4