Remove thread-MPI limitation for GPU direct PME-PP communication