"With thread-MPI there are additional options [TT]-nt[tt], which sets",
"the total number of threads, and [TT]-ntmpi[tt], which sets the number",
"of thread-MPI threads.",
- "Note that using combined MPI+OpenMP parallelization is almost always",
- "slower than single parallelization, except at the scaling limit, where",
- "especially OpenMP parallelization of PME reduces the communication cost.",
- "OpenMP-only parallelization is much faster than MPI-only parallelization",
+ "The number of OpenMP threads used by [TT]mdrun[tt] can also be set with",
+ "the standard environment variable, [TT]OMP_NUM_THREADS[tt].",
+ "The [TT]GMX_PME_NUM_THREADS[tt] environment variable can be used to specify",
+ "the number of threads used by the PME-only processes.[PAR]",
+ "Note that combined MPI+OpenMP parallelization is in many cases",
+ "slower than either on its own. However, at high parallelization, using the",
+ "combination is often beneficial as it reduces the number of domains and/or",
+ "the number of MPI ranks. (Less and larger domains can improve scaling,",
+ "with separate PME processes fewer MPI ranks reduces communication cost.)",
+ "OpenMP-only parallelization is typically faster than MPI-only parallelization",
"on a single CPU(-die). Since we currently don't have proper hardware",
"topology detection, [TT]mdrun[tt] compiled with thread-MPI will only",
"automatically use OpenMP-only parallelization when you use up to 4",