clarified OpenMP-related things in mdrun help/man
authorSzilárd Páll <pszilard@cbr.su.se>
Tue, 26 Feb 2013 22:06:42 +0000 (23:06 +0100)
committerGerrit Code Review <gerrit@gerrit.gromacs.org>
Tue, 28 Jan 2014 10:32:07 +0000 (11:32 +0100)
Added note on OMP_NUM_THREADS/GMX_PME_NUM_THREADS env vars and
improved description on the use-cases when MPI+OpenMP improves
performance.

Change-Id: I904f00c8a4b6907a006b9d4367406d3fa3f3ce42

src/kernel/mdrun.c

index 2ccc6f67469c0aab5263ec48636818785073f661..d0645b3bf5f715e56cc71ac6bb9d9e967f8a9bc1 100644 (file)
@@ -100,10 +100,16 @@ int cmain(int argc, char *argv[])
         "With thread-MPI there are additional options [TT]-nt[tt], which sets",
         "the total number of threads, and [TT]-ntmpi[tt], which sets the number",
         "of thread-MPI threads.",
-        "Note that using combined MPI+OpenMP parallelization is almost always",
-        "slower than single parallelization, except at the scaling limit, where",
-        "especially OpenMP parallelization of PME reduces the communication cost.",
-        "OpenMP-only parallelization is much faster than MPI-only parallelization",
+        "The number of OpenMP threads used by [TT]mdrun[tt] can also be set with",
+        "the standard environment variable, [TT]OMP_NUM_THREADS[tt].",
+        "The [TT]GMX_PME_NUM_THREADS[tt] environment variable can be used to specify",
+        "the number of threads used by the PME-only processes.[PAR]",
+        "Note that combined MPI+OpenMP parallelization is in many cases",
+        "slower than either on its own. However, at high parallelization, using the",
+        "combination is often beneficial as it reduces the number of domains and/or",
+        "the number of MPI ranks. (Less and larger domains can improve scaling,",
+        "with separate PME processes fewer MPI ranks reduces communication cost.)",
+        "OpenMP-only parallelization is typically faster than MPI-only parallelization",
         "on a single CPU(-die). Since we currently don't have proper hardware",
         "topology detection, [TT]mdrun[tt] compiled with thread-MPI will only",
         "automatically use OpenMP-only parallelization when you use up to 4",