Add FloatN aliases to CUDA and use them in NBNXM