Hacker Newsnew | past | comments | ask | show | jobs | submit | naasking's commentslogin

> it's not clear to me based on the description how this could all be done efficiently.

Depends how you define efficiency. The power use of this rig is a lot less than the large data centers that serve trillion parameter models. The page suggests that the final dollar cost per request is an order of magnitude lower than the frontier models charge.


> But none of this helps you solve harder problems, or distinguish between a simple solution which is wrong, and a more complex solution which is correct.

It does because hallucinations and low confidence share characteristics in the embedding vector which the small neural learns to recognize. And the fact that it continuously learns based on the feedback loop is pretty slick.


Agents need the ability to code but also to objectively and accurately evaluate whether changes resulted in real improvements. This requires skills with metrics and statistics. If they can make those reliable then self-improvement is basically assured, on a long enough timeline.

This is how hyperagents work. They Have the ability to measure improvement in both the meta agent and task agents. There approach requires task agents to tackle tasks that can be empirically evaluated.

Private companies are embedded in every healthcare system in the world, even public ones.

I'm talking about the extent of a single company's influence

Newer quantization approaches are even better, 4-bits gets you no meaningful loss relative to FP16: https://github.com/z-lab/paroquant

Hopefully Microsoft keeps pushing BitNet too, so only "1.58" bits are needed.

I think fractional representations are only relevant for training at this point, and bf16 is sufficient, no need for fp4 and such.


Learned rotations for INT4 are cool! Seems similar to SpinQuant? https://arxiv.org/abs/2405.16406

In my personal opinion I don’t think the 1.58 bit work is going to make it into the mainstream.

Not sure why you think fractional representations are only useful for training? Being able to natively compute in lower precisions can be a huge performance boost at inference time.


> Learned rotations for INT4 are cool! Seems similar to SpinQuant? https://arxiv.org/abs/2405.16406

Indeed, but much better! More accurate, less time and space overhead, beats AWQ on almost every bench. I hope it becomes the standard.

> In my personal opinion I don’t think the 1.58 bit work is going to make it into the mainstream.

I hope you're wrong! I'm more optimistic. Definitely a bit more work to be done, but still very promising.

> Being able to natively compute in lower precisions can be a huge performance boost at inference time.

ParoQuant is barely worse than FP16. Any less precise fractional representation is going to be worse than just using that IMO.


Eventually, yes. ParoQuant is hopefully the future here, 4-bit weights with no real degradation from FP16:

https://github.com/z-lab/paroquant


> not be so worried if that side has people you don't like on it.

I think the point is that they don't like Sony music because they are so often on the wrong side, this time included.


> They all make sense to me if we're trying to judge whether these tools are AGI, no?

As long as the mean and median human scores are clearly communicated, the scoring is fine. I think the human scores above would surprise people at first glance, even if they make sense once you think about it, so there's an argument to be made that scores can be misleading.


You could have a system where everyone is directly elected while keeping checks and balances, if voting were restricted, eg. maybe everyone can vote for a president/prime minister, but only non-teachers can vote for an education minister, and only non-finance people can vote for something like the Fed chief, etc. The point being the checks and balances now happen because other groups keep your group in check by voting.

Absolutely! That does keep some of the checks. You can do better than that though!

It's like on the Apollo missions where some parts were made by two completely different manufacturers and worked completely differently.

Hybrid political systems are best. Of course if we like democracy (and most people do), then that should be the most common kind of component. But I'd still like to have some different paradigms mixed into the system. And that's exactly what most modern constitutions do, for better or for worse.


I'd personally go for a two-chamber system (like congress/senate or commons/lords), with one chamber being elected and the other being chosen by sortition.

Maybe also a 3rd chamber, where the weight of your vote was proportional to IQ (much more palatable in EU than US).


This sounds like the opposite of what should be happening? Like an anti-technocracy aiming for an electorate as little informed as possible?

Why exclude teachers from picking the education minister? If we're restricting votes, shouldn't they be the only ones doing so instead?


This sounds great! TurboQuant does KV cache compression using quantization via rotations, and ParoQuant [1] does weight compression using quantization via rotations! So we can get 4-bit weights that match bf16 precision, the KV cache goes down to 3 bits per key. This brings larger models and long contexts into the range of "possibly runnable" on beefy consumer hardware.

[1] https://github.com/z-lab/paroquant


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: