"[PAR]",
"When PME is used with domain decomposition, separate nodes can",
"be assigned to do only the PME mesh calculation;",
- "this is computationally more efficient starting at about 12 nodes.",
+ "this is computationally more efficient starting at about 12 nodes",
+ "or even fewer when OpenMP parallelization is used.",
"The number of PME nodes is set with option [TT]-npme[tt],",
"this can not be more than half of the nodes.",
"By default [TT]mdrun[tt] makes a guess for the number of PME",
- "nodes when the number of nodes is larger than 11 or performance wise",
- "not compatible with the PME grid x dimension.",
- "But the user should optimize npme. Performance statistics on this issue",
+ "nodes when the number of nodes is larger than 16. With GPUs,",
+ "PME nodes are not selected automatically, since the optimal setup",
+ "depends very much on the details of the hardware.",
+ "In all cases you might gain performance by optimizing [TT]-npme[tt].",
+ "Performance statistics on this issue",
"are written at the end of the log file.",
"For good load balancing at high parallelization, the PME grid x and y",
"dimensions should be divisible by the number of PME nodes",