Remove hardcoded warp_size == 32 assumption from PME GPU