This is definitely current models' biggest issue. You're training a model agains...

famouswaffles · on Aug 10, 2023

Modern LLMs are nowhere near the scale of the human brain however you want to slice things so terribly inefficient is very arguable. also language skills seemingly take much less data and scale when you aren't trying to have it learn the sum total of human knowledge. https://arxiv.org/abs/2305.07759

Salgat · on Aug 10, 2023

Scale is a very subjective thing since one is analog (86B neurons) and one is digital (175B parameters). Additionally, consider how many compute hours GPT 3 took to train (10,000 V100s were set aside for exclusive training of GPT 3). I'd say that GPT 3 scale vastly dwarfs the human brain, which runs at a paltry 12 watts.

kaba0 · on Aug 11, 2023

Neumann’s Computer and The Brain book is way out of date in terms of today’s hardware, but funnily it is still relevant in this metric. Biological systems are more analogous to a distributed system of small, very slow CPUs. Even GPUs that somewhat close the gap in-between the few, crazy fast CPUs vs the aforementioned many, slow ones - are still much faster than any one neuron in calculations, but are still overly serial. It is not the number of CPUs, but the number of their connections that make biological systems so powerful.

Salgat · on Aug 14, 2023

Parameters have many connections too though. If the next layer is 1000 parameters wide, you have potentially 1000 connections from a single parameter.

whimsicalism · on Aug 10, 2023

You have to count the training process from the origin of the human brain imo, not from the birth of any individual human.

Neural nets look much more competitive by that standard.

Salgat · on Aug 11, 2023

Yet humans designed the models, so the training process for chat gpt etc includes human evolution by your logic.

whimsicalism · on Aug 11, 2023

This is a good point and the level of so-called task specific "inductive bias" in models is an active point of discussion, but I don't think it is fair to add all of our evolution to the model inductive bias because most of evolution was not towards giving better understanding of language to the model, it was towards better understanding of language in humans.

imtringued · on Aug 10, 2023

They are inefficient by design. Gradient descent and backpropagation scale poorly, but they work and GPUs are cheap, so here we are.