Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do you have a good comparison point? And not, hopefully, comparing to what they could do a decade ago. I'm assuming they didn't sit still. Did they?

I question whether it is just 100x compute. Feels like more, since naturally speaking and friends didn't hog the machine. Again, over a full decade ago.

More, the resources that Google has to throw at training are ridiculous. Well over 100x what was used to build the old models.

None of this is to say we should pack up and go back to a decade ago. I just worry that we do the opposite; where we ignore progress that was made a decade ago in favor of the new tricks alone.



The thing is it is not simply the training but the inference aspect would have require an incredible amount of compute compared to the old way of doing it.

Hope Google will do a paper like they did with the Gen 1 TPUs. Would love to see the difference in terms of joules per word spoke.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: