Hacker Newsnew | past | comments | ask | show | jobs | submit | regularfry's commentslogin

It is about the parameter numbers if what you care about is edge devices with limited RAM. Beyond a certain size your model just doesn't fit, it doesn't matter how good it is - you still can't run it.

Tangentially, have you got any idea what the equivalent "partial tokens revised" rate for humans is? I know I've consciously experienced backtracking and re-interpreting words before, and presumably it happens subconsciously all the time. But that means there's a bound on how low it's reasonable to expect that rate to be, and I don't have an intuition for what it is.

Oh this is fantastic. I'm most interested to see if this reaches down to the raspberry pi zero 2, because that's a whole new ballgame if it does.

The school, in loco parentis.

I think it gets away with being more verbose because those two aren't spelled "#+BEGIN_SRC" and "DONE", they're "C-c , q" and "C-c t d" (from memory). I think unless you really commit to learning a decent subset of what org-mode provides the ergonomics are always going to seem a little clumsy. I've always found emacs shortcuts hard to learn, and because of that I've never quite got my use of org-mode over the activation hump to really stick for the long term. Every time I leave it and come back to it I have to relearn a lot of it from scratch because there doesn't seem to be any sort of intuitive framework I can hang it all off.

I made my own macro* to encapsulate the currently selected text in SOURCE \END_SOURCE tags. Now you're telling me there was a keyboard shortcut for that?

What else I don't know about emacs?

*It was my first macro/function and while creating it I've learned that 1. It wasn't that hard and 2. With help of an LLM you can program emacs a little even without deep knowledge of elisp. Though LLMs suggest very unreadable elisp code and you have to rewrite everything.


The trick is to make the class of pre-approved service types as wide as possible, and make the tools to build them correctly the default. That minimises the number of things that need review in the first place.

Yes providing paved paths that let people build quickly without approvals is really important, while also having inspection to find things that are potential issues.

That's one of the real problems. The other real problem is an active resistance to UI improvement simply because another CAD package did something similar.

FreeCAD doesn't resist those comparisons anymore. They happen regularly. Implementing change is just slow, and purely copying how xyz cad built their UI isn't always compatible with FreeCAD, so a lot of careful consideration goes into things before concepts from other software get implemented. Not to mention that developers seem to really dislike doing frontend work.

Aaaaah! No! You're doing it too! I am not talking about copying how xyz cad built their UI. I am not talking about consciously implementing concepts from other software. I'm talking about this crazy tendency to assume that the reason someone wants UI feature X is because xyz cad does it, not because it's a natural, intuitive thing to want to do. Natural, intuitive things tend to get independently invented; more than once I've made a suggestion and had "this isn't Fusion 360, you know" thrown back at me, despite the fact I've never used F360 to know what comparison they're making.

I wasn't implying you were doing that. I don't play the 'this isn't fusion or xyz cad' argument when someone brings a suggestion. I'm only stating that not every idea will work properly in the context of FreeCAD, but comparisons are considered when such suggestions are made.

The thing you are complaining about is the immediate dismissal the forum used to shoot back at people with ideas. Most likely a conditioned (and very toxic) response from receiving a lot of equally non-constructive feedback using things like F360 as their litmus.

Either way, there is the Design Working Group which evaluates ideas and feedback with a fair lens about what will work in context of FreeCAD and what is feasible to implement without causing unnecessary disruption to existing users. It is a complex social paradox to deal with.


> Except they can't. Their costs are not magically lower when you use claude code vs when you use a third-party client.

I don't have a dog in this fight but is this actually true? If you're using Claude Code they can know that whatever client-side model selection they put into it is active. So if they can get away with routing 80% of the requests to Haiku and only route to Opus for the requests that really need it, that does give them a cost model where they can rely on lower costs than if a third-party client just routes to Opus for everything. Even if they aren't doing that sort of thing now, it would be understandable if they wanted to.


It (CC) does have a /models command, you can still decide to route everything to Opus if you just want to burn tokens I guess it's not default so most wouldn't, but still, people willing to go to a third party client are more likely that kind of power user anyway

They still have the total consumption under their control (*bar prompt caching and other specific optimizations) where in the past they even had different quotas per model, it shouldn't cost them more money, just be a worse/different service I guess


> it shouldn't cost them more money

As things are currently, better models mean bigger models that take more storage+RAM+CPU, or just spend more time processing a request. All this translates to higher costs, and may be mitigated by particular configs triggered by knowledge that a given client, providing particular guarantees, is on the other side.


That’s kind of the point. Even if users can choose which model to use (and apparently the default is the largest one), they could still say (For roughly the same cost): your Opus quota is X, your Haiku quota is Y, go ham. We’ll throttle you when you hit the limit.

But they don't want the subscription to be quota'd like that. The API automatically does that though, as different models use different amounts of tokens when generating responses, and the billing is per token. And quite literally is having the user account for the actual costs of usage, which is the thing said users are trying to avoid, on their own terms, and getting upset about when they aren't.

> It (CC) does have a /models command, you can still decide to route everything to Opus if you just want to burn tokens I guess it's not default so most wouldn't

Opus is claude code's default model as of sometime recently (around Opus 4.6?)


That’s not how Claude Code works. It’s not like a web chatbot with a layer that routes based on complexity of request.

You don't control what happens when a request hits their endpoint though.

> Regarding the 4% improvement for human written AGENTS.md: this would be huge indeed if it were a _consistent_ improvement. However, for example on Sonnet 4.5, performance _drops_ by over 2%. Qwen3 benefits most and GPT-5.2 improves by 1-2%.

Ok so that's interesting in itself. Apologies if you go into this in the paper, not had time to read it yet, but does this tell us something about the models themselves? Is there a benchmark lurking here? It feels like this is revealing something about the training, but I'm not sure exactly what.


It could... but as pointed out by other the significance is unclear and per-model results have even less samples than the benchmark average. So: maybe :)

So initially my thought was "why would this be better than existing infill patterns" but my second thought was that the reason Miura-ori patterns are interesting in the first place is because they fold. Not in this application so much, but in general, the way they flex is why they're interesting. The upshot here is that if you embedded that sort of pattern in a closed box, the degrees of freedom would try to transfer the force of a vertical load on the top to a horizontal stress in the outer shell of the base, in both x and y. A bit like a spherical dome.

I'm not sure that it's better than a dome; it might be for cases where you can't predict where on the top surface the load is going to be? I'm also not sure that a sheet of printed infill is sufficiently similar in its physical properties to a sheet of paper/card for this to transfer well, but it would be an interesting experiment to do.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: