Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Then you don't understand Machine Learning in any real way. Literally the 3rd or 4th thing you learn about ML is that for any given problem, there is an ideal model size. Just making the model bigger doesn't work because of something called the curse of dimensionality. This is something we have discovered about every single problem and type of learning algorithm used in ML. For LLMs, we probably moved past the ideal model size about 18 months ago. From the POV of something who actually learned ML in school (from the person who coined the term), I see no real reason to think that AGI will happen based upon the current techniques. Maybe someday. Probably not anytime soon.

PS The first thing you learn about ML is to compare your models to random to make sure the model didn't degenerate during training.

 help



Doesn’t sound like you paid all that much attention when learning ML. The curse of dimensionality doesn’t say that every problem has some ideal model size, it says that the amount of data needed to train scales with the size of the feature space. So if you take an LLM, you can make the network much larger but if you don’t increase the size of the input token vocabulary you aren’t even subject to the curse of dimensionality. Beyond that, there’s a principle in ML theory that says larger models are almost always better because the number of params in the model is the dimensionality of the space in which you’re running gradient descent and with every added dimension, local optima become rarer.

> Literally the 3rd or 4th thing you learn about ML is that for any given problem, there is an ideal model size.

From my understanding this is now outdated. The deep double descent research showed that although past a certain point performance drops as you increase model size, if you keep increasing it there is another threshold where it paradoxically starts improving again. From that point onwards increasing the parameter count only further improves performance.


That isn't what that research says at all. What that research says is that running the same training data through multiple times improves training. There is still an ideal model size though, it is just impacted by the total volume of training data.

https://arxiv.org/pdf/1912.02292 "We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better." That is the first sentence of the abstract. The first graph shown in the paper backs it up.

Looking into it further, it seems that typical LLMs are in the first descent regime anyway though so my original point is not too relevant for them anyway it seems. Also it looks like the second descent region doesn't always reach a lower loss than the first, it appears to depend on other factors as well.


Um, what? Are you interpreting scaling to mean adding parameters and nothing else?

I'm not entirely sure where you get your confidence that we've past the ideal model size from, but at least that's a clear prediction so you should be able to tell if and when you are proven wrong.

Just for the record, do you care to put an actual number on something we won't go past?

[edit] Vibe check on user comes out as

    Contrarian 45%
    Pedantic 35%
    Skeptical 15%
    Direct  5%
That's got to be some sort of record.

Is there a tool or something that gives this vibe check? (Serious question)

How are you calculating that? Also, my 1000 foot view would see that "rating" as something most HN commenters would match.

It's comparatively few really

for instance yours comes out as

Analytical 45%, Cynical 30%, Pedantic 15%, Melancholic 10%

and mine is

Philosophical 35%, Hardware-Obsessed 25%, Analytically Pedantic 20%, Retro-Nostalgic 15%, Anti-Ad Skeptic 5%

You should consider gathering all of your analysis and pedantry into one easy to manage neurosis.

It's from https://hn-wrapped.kadoa.com


> How are you calculating that?

He's using a tool that was shared on HN some time back that takes a username and generates those states from the posts made.

When I last checked, of over 10k posts, it only uses a few dozen to calculate that score, so it is about as reliable as dowsing.

> Also, my 1000 foot view would see that "rating" as something most HN commenters would match.

Probably. Why else join a discussion if you're going to be a yes-man to every comment?


>When I last checked, of over 10k posts, it only uses a few dozen to calculate that score, so it is about as reliable as dowsing.

A few samples are sufficient when the signal is strong enough. The time spent pie chart is definitely more what the user has been doing recently.

Overall, not everybody comes out the same, Pedantry is strong which I'm not really surprised about for a forum like this, but there are definitely personality traits of some users of sufficient magnitude that you can guess what the result will be.

Looking at the last 10 users who posted comments on HN are

Contrarian45%, Didactic25%, Skeptical15%, Analytical10%, Adversarial5%

Skeptical45%, Analytical30%, Contrarian15%, Helpful10%

Heretical45%, Low-Level Pedantic25%, Chaotic Helpful15%, Hardware-Jaded15%

Contrarian45%, Pedantic30%, Skeptical15%, Helpful10%

Helpful75%, Nostalgic15%, Appreciative5%, Skeptical5%

Defensive45%, Intellectual Flexing25%, Techno-Optimist20%, Exasperated10%

Skeptical45%, Pragmatic25%, Nostalgic20%, Helpful10%

Pedantic45%, Helpful25%, Techno-skeptic20%, Nostalgic10%

Pragmatic40%, Nostalgic25%, Opinionated20%, Visionary15%

Technically Precise45%, Disillusioned25%, Deeply Empathetic15%, Anti-AI Crusader15%

Contrarian45%, Didactic25%, Skeptical15%, Analytical10%, Adversarial5%

Skeptical45%, Analytical30%, Contrarian15%, Helpful10%

Heretical45%, Low-Level Pedantic25%, Chaotic Helpful15%, Hardware-Jaded15%

Contrarian45%, Pedantic30%, Skeptical15%, Helpful10%

Helpful75%, Nostalgic15%, Appreciative5%, Skeptical5%

Defensive45%, Intellectual Flexing25%, Techno-Optimist20%, Exasperated10%

Skeptical45%, Pragmatic25%, Nostalgic20%, Helpful10%

Pedantic45%, Helpful25%, Techno-skeptic20%, Nostalgic10%

Pragmatic40%, Nostalgic25%, Opinionated20%, Visionary15%

Technically Precise45%, Disillusioned25%, Deeply Empathetic15%, Anti-AI Crusader15%

Obviously this won't be a representative sample of HN because it will vary by time of day and topics under discussion. It's sufficient to show that the community is not entirely homogeneous.


From the POV of something who actually learned ML in school (from the person who coined the term)

Sounds like that was quite awhile ago.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: