> it's not clear to me based on the description how this could all be done efficiently.
Depends how you define efficiency. The power use of this rig is a lot less than the large data centers that serve trillion parameter models. The page suggests that the final dollar cost per request is an order of magnitude lower than the frontier models charge.
> But none of this helps you solve harder problems, or distinguish between a simple solution which is wrong, and a more complex solution which is correct.
It does because hallucinations and low confidence share characteristics in the embedding vector which the small neural learns to recognize. And the fact that it continuously learns based on the feedback loop is pretty slick.
Agents need the ability to code but also to objectively and accurately evaluate whether changes resulted in real improvements. This requires skills with metrics and statistics. If they can make those reliable then self-improvement is basically assured, on a long enough timeline.
This is how hyperagents work. They Have the ability to measure improvement in both the meta agent and task agents. There approach requires task agents to tackle tasks that can be empirically evaluated.
In my personal opinion I don’t think the 1.58 bit work is going to make it into the mainstream.
Not sure why you think fractional representations are only useful for training? Being able to natively compute in lower precisions can be a huge performance boost at inference time.
> They all make sense to me if we're trying to judge whether these tools are AGI, no?
As long as the mean and median human scores are clearly communicated, the scoring is fine. I think the human scores above would surprise people at first glance, even if they make sense once you think about it, so there's an argument to be made that scores can be misleading.
You could have a system where everyone is directly elected while keeping checks and balances, if voting were restricted, eg. maybe everyone can vote for a president/prime minister, but only non-teachers can vote for an education minister, and only non-finance people can vote for something like the Fed chief, etc. The point being the checks and balances now happen because other groups keep your group in check by voting.
Absolutely! That does keep some of the checks. You can do better than that though!
It's like on the Apollo missions where some parts were made by two completely different manufacturers and worked completely differently.
Hybrid political systems are best. Of course if we like democracy (and most people do), then that should be the most common kind of component. But I'd still like to have some different paradigms mixed into the system. And that's exactly what most modern constitutions do, for better or for worse.
I'd personally go for a two-chamber system (like congress/senate or commons/lords), with one chamber being elected and the other being chosen by sortition.
Maybe also a 3rd chamber, where the weight of your vote was proportional to IQ (much more palatable in EU than US).
This sounds great! TurboQuant does KV cache compression using quantization via rotations, and ParoQuant [1] does weight compression using quantization via rotations! So we can get 4-bit weights that match bf16 precision, the KV cache goes down to 3 bits per key. This brings larger models and long contexts into the range of "possibly runnable" on beefy consumer hardware.
Depends how you define efficiency. The power use of this rig is a lot less than the large data centers that serve trillion parameter models. The page suggests that the final dollar cost per request is an order of magnitude lower than the frontier models charge.
reply