This is my hope as well. We now have time to write things a bit better. Comment on the pr with a quick improvement and it can just happen. But I’m failing to convince people at work. The majority seem to just be happy for code to go away and for us to never think about it again.
Mistral has been releasing some cool stuff. Definitively behind on frontier models but they are working a different angle. Was just talking at work about how hard model training is for a small company so we’d probably never do it. But with tools like this, and the new unsloth release, training feels more in reach.
I have the opposite opinion. More information is always better. Absolutely the reporting requirements are onerous and there already are perverse incentives to chase things quarterly. Reducing reporting requirements is only going to make things worse though. The only solution I can imagine is to instead drop reporting requirements to instant. Make all public companies truly public. Reporting information should to be accessible via a feed 24/7. There can be no more perverse incentives if there’s no hiding. Insane and unlikely? Sure yea. But let’s not pretend that reducing information is going to help anything.
Or even start with monthly. The problem with quarterly reporting is the internal efforts to "game" the quarter. The more aggressive disclosures are, the less of a shell game people can play to "make the report come out right."
Moving it to bi-yearly does the opposite. CEOs can now do the same amount of gaming with half the effort. Or twice the gaming with the same effort.
It doesn't even have to be that convoluted. Any sudden dangerous situation: natural disaster, break in, medical emergency(positive or negative what about a baby being born) where a car is the only solution and a reasonable, but inebriated, person makes the better of two bad decisions is going to need an override. I don't want to be pessimistic but this really seems like the sort of thing where a few people will die, lawsuits will happen, congress will mandate an override/make it optional, it'll be gone in maybe 10 years. It's madness but seemingly this is how things are done.
I do not think it will happen but this is why in discussions about this happening, or historical fiction, typically the places that break off are the ones that were distinct _before_ they joined the US. Any of the 13 colonies, New England as a block having the strongest colonial identity that I'm aware of, Texas, or California generally are where it's assumed to start as those were countries/had identities very much outside of the US while also having economies that might be ok.
This doesn't seem particularly formal. I still remain unconvinced reducing is really going to be valuable. Code obviously is as formal as it gets but as you trend away from that you quickly introduce problems that arise from lack of formality. I could see a world in which we're all just writing tests in the form of something like Gherkin though.
People seem weirdly eager to talk to LLMs in proto-code instead of fixing the base problem that LLMs are just unreliable interpreters. If your tool needs a new human-friendly DSL to avoid the ambiguity of plain English, maybe what you really want is to be writing actual code or specs with a type system and feedback loop. Any halfway formalism gives a false sense of precision, and you still get blindsided by the same model quirks, just dressed up differently.
> I could see a world in which we're all just writing tests in the form of something like Gherkin though.
Yes, and the implementation... no one actually cares about that. This would be a good outcome in my view. What I see is people letting LLMs "fill in the tests", whereas I'd rather tests be the only thing humans write.
While I'm also a bit skeptical, I think some formalism could really simplify everything. The programming world has lots of words that mean close to the same thing (subroutine, method, function, etc. ). Why not choose one and stick to it for interactions with the LLM? It should save plenty of complexity.
I don't know that graph to me shows Sonnet 4.5 as worse than 3.7. Maybe the automated grader is finding code breakages in 3.7 and not breaking that out? But I'd much prefer to add code that is a different style to my codebase than code that breaks other code. But even ignoring that the pass rate is almost identical between the two models.
reply