Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is definitely current models' biggest issue. You're training a model against millions of books worth of data (which would take a human tens of thousands of lifetimes) to achieve a superficial level of conversational ability to match a human, which can consume at most 3 novels a day without compromising comprehension. Current models are terribly inefficient when it comes to learning from data.


Modern LLMs are nowhere near the scale of the human brain however you want to slice things so terribly inefficient is very arguable. also language skills seemingly take much less data and scale when you aren't trying to have it learn the sum total of human knowledge. https://arxiv.org/abs/2305.07759


Scale is a very subjective thing since one is analog (86B neurons) and one is digital (175B parameters). Additionally, consider how many compute hours GPT 3 took to train (10,000 V100s were set aside for exclusive training of GPT 3). I'd say that GPT 3 scale vastly dwarfs the human brain, which runs at a paltry 12 watts.


Neumann’s Computer and The Brain book is way out of date in terms of today’s hardware, but funnily it is still relevant in this metric. Biological systems are more analogous to a distributed system of small, very slow CPUs. Even GPUs that somewhat close the gap in-between the few, crazy fast CPUs vs the aforementioned many, slow ones - are still much faster than any one neuron in calculations, but are still overly serial. It is not the number of CPUs, but the number of their connections that make biological systems so powerful.


Parameters have many connections too though. If the next layer is 1000 parameters wide, you have potentially 1000 connections from a single parameter.


You have to count the training process from the origin of the human brain imo, not from the birth of any individual human.

Neural nets look much more competitive by that standard.


Yet humans designed the models, so the training process for chat gpt etc includes human evolution by your logic.


This is a good point and the level of so-called task specific "inductive bias" in models is an active point of discussion, but I don't think it is fair to add all of our evolution to the model inductive bias because most of evolution was not towards giving better understanding of language to the model, it was towards better understanding of language in humans.


They are inefficient by design. Gradient descent and backpropagation scale poorly, but they work and GPUs are cheap, so here we are.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: