Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I used to also believe along these lines but lately I'm not so sure.

I'm honestly shocked by the latest results we're seeing with Gemini 3 Deep Think, Opus 4.6, and Codex 5.3 in math, coding, abstract reasoning, etc. Deep Think just scored 84.6% on ARC-AGI-2 (https://deepmind.google/models/gemini/)! And these benchmarks are supported by my own experimentation and testing with these models ~ specifically most recently with Opus 4.6 doing things I would have never thought possible in codebases I'm working in.

These models are demonstrating an incredible capacity for logical abstract reasoning of a level far greater than 99.9% of the world's population.

And then combine that with the latest video output we're seeing from Seedance 2.0, etc showing an incredible level of image/video understanding and generation capability.

I was previously deeply skeptical that the architecture we have would be sufficient to get us to AGI. But my belief in that has been strongly rattled lately. Honestly I think the greatest gap now is simply one of orchestration, data presentation, and work around in-context memory representations - that is, converting work done into real world into formats/representations, etc. amenable for AI to run on (text conversion, etc.) and keeping new trained/taught information in context to support continual learning.

 help



>These models are demonstrating an incredible capacity for logical abstract reasoning of a level far greater than 99.9% of the world's population.

This is the key I think that Altman and Amodei see, but get buried in hype accusations. The frontier models absolutely blow away the majority of people on simple general tasks and reasoning. Run the last 50 decisions I've seen locally through Opus 4.6 or ChatGPT 5.2 and I might conclude I'd rather work with an AI than the human intelligence.

It's a soft threshold where I think people saw it spit out some answers during the chat-to-LLM first hype wave and missed that the majority of white collar work (I mean it all, not just the top software industry architects and senior SWEs) seems to come out better when a human is pushed further out of the loop. Humans are useful for spreading out responsibility and accountability, for now, thankfully.


LLMs are very good at logical reasoning in bounded systems. They lack the wisdom to deal with unbounded systems efficiently, because they don't have a good sense of what they don't know or good priors on the distribution of the unexpected. I expect this will be very difficult to RL in.

> These models are demonstrating an incredible capacity for logical abstract reasoning of a level far greater than 99.9% of the world's population.

And yet they have trouble knowing that a person should take their car to a car wash.

I also saw a college professor who put various AI models through all his exams for a freshman(?) level class. Most failed, I think one barely passed, if I remember correctly.

I’ve been reading about people being shocked by how good things are for years now, but while there may be moments of what seems like incredible brilliance, there are also moments of profound stupidity. AI optimists seem to ignore these moments, but they very real.

If someone on my team performed like AI, I wouldn’t trust them with anything.


> And yet they have trouble knowing that a person should take their car to a car wash.

SotA models don't.


So what's the underlying message here? Don't prepare?

To remain skeptical of extraordinary claims.

While I think 99.9% is overstating it, I can believe that number is strictly more than 1% at this point.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: