Improved the intra-GPU load balancing