With thread-MPI mdrun would choose the number of OpenMP threads so
that the maximal number of hardware threads was used. When the number
of ranks was limited by the system size, this led to too high OpenMP
thread counts which lowered the efficiency. Now a limit is imposed.
Also updated some comments and renamed constants and bNTOptSet.
Change-Id: I830b5a3f2fd28f87acfbcf982103b62fc3e45758