Use subgroup for warp_any and CJ4 prefetch