When separate PME ranks have more load than the PP ranks, DLB can
not improve improve performance. It will actually make it worse,
because the PME x/f redistribution time goes up. Now with -dlb=auto
DLB is not turned on in this situation.
Also DLB is not activated during PME tuning with GPUs and separate
PME nodes, since it then nearly always deteriorates the performance.
Change-Id: I1f5e649a9562fdca9ba538196f41a12feb0a4a24