improved the nbnxn buffer size estimate with GPUs