More

all_factz · 2026-03-01T09:35:10 1772357710

“Which side”? What other side is there in Iran? You think there’s some shadow government that can realistically topple the mullahs from within? The only way the Shah comes back is with US boots on the ground, which would be a disaster for other reasons. Until that happens this is just reckless action that makes the regime even more radical than it already is.

tim333 · 2026-03-01T13:16:11 1772370971

I'm not sure - I'm not that up on all that but there's the

>coalition of liberal and nationalist political parties selected Reza Pahlavi to lead a transitional government until the realisation of democratic elections https://en.wikipedia.org/wiki/Iranian_opposition#:~:text=On%...

thing. Maybe if enough Iranian people back that?

bluGill · 2026-03-01T11:55:56 1772366156

There are a lot of well educated people in iran who were unhappy. Iran killed more than 30,000 protesters last month, and there are who knows how many more left.

only time will tell. I give iran much better than average odds this is for the better. Though the average is really bad: bad results would not surprise me.

all_factz · 2026-02-02T05:10:36 1770009036

VENI/VIDI/VICI are easy for anyone who studied Latin (as indeed used to be common), and ARIA is similarly easy for anyone who knows about opera. Basically, the crossword is for snobs.

swores · 2026-02-02T07:18:27 1770016707

I agree that crosswords often include cultural references that lean towards certain demographics / assuming particular education, and that can feel exclusionary if you don’t share that background - and there's even an argument to suggest snobbery might be behind those choices.

But I disagree that that makes it for snobs. Snobbery is more about an attitude of looking down on others or their tastes, whereas knowing Latin or being a fan of opera is really just about exposure.

Sure, there exist some (too many) opera fans who would say something like "it's real art compared to pop or hip hop being low class trash", but that's not a defining part of liking opera and plenty of people who like opera aren't snobs. Ironically it's a different form of snobbery (sometimes called reverse snobbery though personally I hate that term), to dismiss anyone who learned Latin or who likes opera as being a snob!

all_factz · 2026-01-01T16:21:15 1767284475

“I reject demonisation of the wealthy” is quite an odd thing for someone identifying as “far left” to say. But then you go on to identify as a “left liberal” - canonically not considered far left - so maybe I shouldn’t be surprised.

Whether it’s worth demonizing anyone or not, we can condemn actions that hurt innocent people and we can maintain skepticism of the ultra-wealthy and their motives without “bullying”. It does sound like your principle of compassion extends a little too much towards capital and not enough towards labor.

Lerc · 2026-01-01T17:11:07 1767287467

Therein lies the rub, when people are surprised to see a left wing person criticising the idea of othering, you have to wonder what principles they have left to call left wing.

>we can condemn actions that hurt innocent people and we can maintain skepticism of the ultra-wealthy and their motives without “bullying”.

Indeed and I do condemn actions. What I don't do is conddemn people.

I am for robust regulation, free expression, free movement, worker rights, Limiting wealth inequality, free fundimental services of health educatiion. I want more police but with fewer powers. I support harm minimalization over punishing drug users, I favour rehabilitation in prison over training recidivists, I am against hate in all its forms. My most extreme views would be that advertising is inherently harmful to society, and teaching any religion as true to someone under the age of consent is child abuse.

All of these come from the principle that I think all people have feelings,worthless and rights, they deserve the best we can provide for them. If they disagree with you the first step is trying to understand their point of view.

To me, imposing your will on others, dismissing people for thinking the wrong thing, shunning them for saying the wrong thing or associating with 5he wrong people, these are all properties that stand at the other end of the spectrum to me. I don't particularly care what label you put on the ideology over there, but whatever it is, those are the attributes that have caused some of the darkest moments in h7man history.

all_factz · 2026-01-01T17:31:50 1767288710

Sure, I agree. Kneejerk condemnation and othering is bad.

But there’s a need to balance even-handedness with a healthy skepticism of those in power. Otherwise you risk becoming an apologist. No one is saying not to do your homework or not to think critically, but we’re also saying not to come in guns blazing in defense of moneyed interests. That’s what the person who brought up the hidden agenda stuff seemed to be doing - making assumptions that favor capital without even taking the time to read the article that addressed those assumptions. That’s not even-handed, it’s biased against labor.

Lerc · 2026-01-01T21:44:21 1767303861

Consider the original post I responded to. It asked two questions.

>are you skeptical xai would wiggle around regulations and pollute a city?

But they were responding to I'm incredibly skeptical of any claim that xAI's power use is putting a dent in the local environment which makes no claim as to whether they might obey or disobey regulations, the words "putting a dent" in the local envionnent*

The data from the article does not sufficiently address this, it uses satellite data and a short time frame. Without specifying the resolution of their data (which could be kilometer sized pixels) their claims about locality is in doubt. In short term measures, trends are harder to spot, a rise over months could just mean it is less windy in the nollowing season. Without a ground level meadurement of the air quality and a evaluation of the total local emission from all sources, you cannot hope to measure the health impact of a single cause.

None of that says that they are not polluting. What it says is that this is not evidence of it. Someone expressed skepticism based on the proportional emission of one of many of their ability to move the dial, and was challenged based upon the likelihood of what they might do. Claiming skeptasism that a thief could rob Fort Knox, is not a claim that the thief is honest

>by “agenda pushing” do you mean those who have an agenda to have breathable air?

I simply cannot believe that this is a reasonable interpretation of what they said.

That's the thing that motivated me to post on this thread. That the first post I responded to here was attacking the player, not the ball.

I continue to post replies here out of my own sense of duty to fully explain my position to promote understanding, I'm not trying to win anything here, I only want people to see a honestly held perspective.

all_factz · 2025-12-30T00:32:30 1767054750

React is hundreds of thousands of lines of code (or millions - I haven’t looked in awhile). Sure, you can start by having the LLM create a simple way to sync state across components, but in a serious project you’re going to run into edge-cases that cause the complexity of your LLM-built library to keep growing. There may come a point at which the complexity grows to such a point that the LLM itself can’t maintain the library effectively. I think the same rough argument applies to MomentJS.

simonw · 2025-12-30T01:04:14 1767056654

If the complexity grows beyond what it makes sense to do without React I'll have the LLM rewrite it all in React!

I did that with an HTML generation project to switch from Python strings to Jinja templates just the other day: https://github.com/simonw/claude-code-transcripts/pull/2

DrammBA · 2025-12-30T01:38:25 1767058705

Simon, you're starting to sound super disconnected from reality, this "I hit everything that looks like a nail with my LLM hammer" vibe is new.

simonw · 2025-12-30T01:42:00 1767058920

My habits have changed quite a bit with Opus 4.5 in the past month. I need to write about it..

godelski · 2025-12-30T02:59:25 1767063565

What's concerning to many of us is that you've (and others) have said this same thing s/Opus 4.5/some other model/

That feels more like chasing than a clear line of improvement. It's interrupted very different from something like "my habits have changed quite a bit since reading The Art of Computer Programming". They're categorically different.

mkozlows · 2025-12-30T06:53:36 1767077616

It's because the models keep getting better! What you could do with GPT-4 was more impressive than what you could do with GPT 3.5. What you could do with Sonnet 3.5 was more impressive yet, and Sonnet 4, and Sonnet 4.5.

Some of these improvements have been minor, some of them have been big enough to feel like step changes. Sonnet 3.7 + Claude Code (they came out at the same time) was a big step change; Opus 4.5 similarly feels like a big step change.

(If you don't trust vibes, METR's task completion benchmark shows huge improvements, too.)

If you're sincerely trying these models out with the intention of seeing if you can make them work for you, and doing all the things you should do in those cases, then even if you're getting negative results somehow, you need to keep trying, because there will come a point where the negative turns positive for you.

If you're someone who's been using them productively for a while now, you need to keep changing how you use them, because what used to work is no longer optimal.

godelski · 2025-12-30T09:37:20 1767087440

Models keep getting better but the argument I'm critiquing stays the same.

So does the comment I critiqued in the sibling comment to yours. I don't know why it's so hard to believe we just haven't tried. I have a Claude subscription. I'm an ML researcher myself. Trust me, I do try.

But that last part also makes me keenly aware of their limitations and failures. Frankly I don't trust experts who aren't critiquing their field. Leave the selling points to the marketing team. The engineer and researcher's job is to be critical. To find problems. I mean how the hell do you solve problems if you're unable to identify them lol. Let the marketing team lead development direction instead? Sounds like a bad way to solve problems

  > benchmark shows huge improvements

Benchmarks are often difficult to interpret. It is really problematic that they got incorporated into marketing. If you don't understand what a benchmark measures, and more importantly, what it doesn't measure, then I promise you that you're misunderstanding what those numbers mean.

For METR I think they say a lot right here (emphasis my own) that reinforces my point

  > Current frontier AIs are vastly better than humans at text prediction and knowledge tasks. They outperform experts on most *exam-style problems* for a fraction of the cost. ... And yet the best AI agents are not currently able to carry out substantive projects by themselves or directly substitute for human labor. *They are unable to reliably handle even relatively low-skill*, computer-based work like remote executive assistance. It is clear that capabilities are increasing very rapidly in some sense, but it is unclear how this corresponds to real-world impact.

So make sure you're really careful to understand what is being measured. What improvement actually means. To understand the bounds.

It's great that they include longer tasks but also notice the biases and distribution in the human workers. This is important in properly evaluating.

Also remember what exactly I quoted. For a long time we've all known that being good at leetcode doesn't make one a good engineer. But it's an easy thing to test and the test correlates with other skills that are likely to be learned to be good at those tests (despite being able to metric hack). We're talking about massive compression machines. That pattern match. Pattern matching tends to get much more difficult as task time increases but this is not a necessary condition.

Treat every benchmark adversarialy. If you can't figure out how to metric hack it then you don't know what a benchmark is measuring (and just because you know what can hack it doesn't mean you understand it nor that that's what is being measured)

mkozlows · 2025-12-30T15:40:44 1767109244

I think you should ask yourself: If it were true that 1) these things do in fact work, 2) these things are in fact getting better... what would people be saying?

The answer is: Exactly what we are saying. This is also why people keep suggesting that you need to try them out with a more open mind, or with different techniques: Because we know with absolute first-person iron-clad certainty what is possible, and if you don't think it's possible, you're missing something.

nl · 2025-12-30T10:11:07 1767089467

I don't understand what your argument is.

It seems to be "people keep saying the models are good"?

That's true. They are.

And the reason people keep saying it is because the frontier of what they do keeps getting pushed back.

Actual, working, useful code completion in the GPT 4 days? Amazing! It could automatically write entire functions for me!

The ability to write whole classes and utility programs in the Claude 3.5 days? Amazing! This is like having a junior programmer!

And now, with Opus 4.5 or Codex Max or Gemini 3 Pro we can write substantial programs one-shot from a single prompt and they work. Amazing!

But now we are beginning to see that programming in 6 months time might look very different to now because these AI system code very differently to us. That's exactly the point.

So what is it you are arguing against?

I think you said you didn't like that people are saying the same thing, but in this post it seems more complicated?

timr · 2025-12-30T14:50:07 1767106207

> And now, with Opus 4.5 or Codex Max or Gemini 3 Pro we can write substantial programs one-shot from a single prompt and they work. Amazing!

People have been doing this parlor trick with various "substantial" programs [1] since GPT 3. And no, the models aren't better today, unless you're talking about being better at the same kinds of programs.

[1] If I have to see one more half-baked demo of a running game or a flight sim...

simonw · 2025-12-30T16:11:27 1767111087

"And no, the models aren't better today"

Can you expand on that? It doesn't match my experience at all.

timr · 2025-12-30T16:36:16 1767112576

It’s a vague statement that I obviously cannot defend in all interpretations, but what I mean is: the performance of models at making non-trivial applications end-to-end, today, is not practically better than it was a few years ago. They’re (probably) better at making toys or one-shotting simple stuff, and they can definitely (sometimes) crank out shitty code for bigger apps that “works”, but they’re just as terrible as ever if you actually understand what quality looks like and care to keep your code from descending into entropy.

I think "substantial" is doing a lot of heavy lifting in the sentence I quoted. For example, I’m not going to argue that aspects of the process haven’t improved, or that Claude 4.5 isn't better than GPT 4 at coding, but I still can’t trust any of the things to work on any modestly complex codebase without close supervision, and that is what I understood the broad argument to be about. It's completely irrelevant to me if they slay the benchmarks or make killer one-shot N-body demos, and it's marginally relevant that they have better context windows or now hallucinate 10% less often (in that they're more useful as tools, which I don't dispute at all), but if you want to claim that they're suddenly super-capable robot engineers that I can throw at any "substantial" problem, you have to bring evidence, because that's a claim that defies my day-to-day experience. They're just constantly so full of shit, and that hasn't changed, at all.

FWIW, this line of argument usually turns into a mott and bailey fallacy, where someone makes an outrageous claim (e.g. "models have recently gained the ability to operate independently as a senior engineer!"), and when challenged on the hyperbole, retreats to a more reasonable position ("Claude 4.5 is clearly better than GPT 3!"), but with the speculative caveat that "we don't know where things will be in N years". I'm not interested in that kind of speculation.

simonw · 2025-12-30T19:21:09 1767122469

Have you spent much time with Codex 5.1 or 5.2 in OpenAI Codex or a Claude Opus 4".5 in Claude code over the last ~6 weeks?

I think they represent a meaningful step change in what models can build. For me they are the moment we went from building relatively trivial things unassisted to building quite large and complex system that take multiple hours, often still triggered by a single prompt.

Some personal examples from the past few weeks.

- A spec-compliant HTML5 parsing library by Codex 5.2: https://simonwillison.net/2025/Dec/15/porting-justhtml/

- A CLI-based transcript export and publishing tool by Opus 4.5: https://simonwillison.net/2025/Dec/25/claude-code-transcript...

- A full JavaScript interpreter in dependency/free Python (!) https://github.com/simonw/micro-javascript - and here's that transcript published using the above-mentioned tool: https://static.simonwillison.net/static/2025/claude-code-mic...

- A WebAssembly runtime in Python which I haven't yet published

The above projects all took multiple prompts, but were still mostly built by prompting Claude Code for web on my iPhone in between Christmas family things.

I have a single-prompt one:

- A Datasette plugin that integrates Cloudflare's CAPTCHA system: https://github.com/simonw/datasette-turnstile - transcript: https://gistpreview.github.io/?2d9190335938762f170b0c0eb6060...

I'm not confident any of these projects would have worked with the coding agents and models we had had four months ago. There is no chance they would've worked with the January 2025 available models.

lordmauve · 2025-12-31T07:50:55 1767167455

Are you using Stop hooks to keep Claude running on a task until it completes, or is it doing that by itself?

simonw · 2025-12-31T07:59:19 1767167959

I'm not using those yet.

I mainly eat it clear tasks like "keep going until all these tests pass", but I do keep an eye on it and occasionally tell it to keep going.

timr · 2025-12-31T03:09:24 1767150564

I’ve used Sonnet 4.5 and Codex 5 and 5.1, but not in their native environment [1].

Setting aside the fact that your examples are mostly “replicate this existing thing in language X” [2], again, I’m not saying that the models haven’t gotten better at crapping out code, or that they’re not useful tools. I use them every day. They're great tools, when someone actually intelligent is using them. I also freely concede that they're better tools than a year ago.

The devil is (as always) in the details: how many prompts did it take? what exactly did you have to prompt for? how closely did you look at the code? how closely did you test the end result? Remember that I can, with some amount of prompting, generate perfectly acceptable code for a complex, real-world app, using only GPT 4. But even the newest models generate absolute bullshit on a fairly regular basis. So telling me that you did something complex with an unspecified amount of additional prompting is fine, but not particularly responsive to the original claim.

[1] Copilot, with a liberal sprinkling of ChatGPT in the web UI. Please don’t engage in “you’re holding it wrong” or "you didn't use the right model" with me - I use enough frontier models on a regular basis to have a good sense of their common failings and happy paths. Also, I am trying to do something other than experiment with models, so if I have to switch environments every day, I’m not doing it. If I have to pay for multiple $200 memberships, I’m not doing it. If they require an exact setup to make them “work”, I am unlikely to do it. Finally, if your entire argument here hinges on a point release of a specific model in the last six weeks…yeah. Not gonna take that seriously, because it's the same exact argument, every six weeks. </caveats>

[2] Nothing really wrong with this -- most programming is an iterative exercise of replicating pre-existing things with minor tweaks -- but we're pretty far into the bailey now, I think. The original argument was that you can one-shot a complex application. Now we're in "I can replicate a large pre-existing thing with repeated hand-holding". Fine, and completely within my own envelope for model performance, but not really the original claim.

simonw · 2025-12-31T08:09:35 1767168575

I know you said don't engage in "you're holding it wrong"... but have you tried these models running in a coding agent tool loop with automatic approvals turned on?

Copilot style autocomplete or chatting with a model directly is an entirely different experience from letting the model spend half an hour writing code, running that code and iterating on the result uninterrupted.

Here's an example where I sent a prompt at 2:38pm and it churned away for 7 minutes (executing 17 bash commands), then I gave it another prompt and it churned for half an hour and shipped 7 commits with 160 passing tests: https://static.simonwillison.net/static/2025/claude-code-mic...

I completed most of that project on my phone.

timr · 2025-12-31T08:46:28 1767170788

> I know you said don't engage in "you're holding it wrong"... but have you tried these models running in a coding agent tool loop with automatic approvals turned on?

edit: I wrote a different response here, then I realized we might be talking about different things.

Are you asking if I let the agents use tools without my prior approval? I do that for a certain subset of tools (e.g. run tests, do requests, run queries, certain shell commands, even use the browser if possible), but I do not let the agents do branch merges, deploys, etc. I find that the best models are just barely good enough to produce a bad first draft of a multi-file feature (e.g. adding an entirely new controller+view to a web app), and I would never ever consider YOLOing their output to production unless I didn't care at all. I try to get to tests passing clean before even looking at the code.

Also, I am happy to let Copilot burn tokens in this manner and will regularly do it for refactors or initial drafts of new features, I'm honestly not sure if the juice is worth the squeeze -- I still typically have to spend substantial time reworking whatever they create, and the revision time required scales with the amount of time they spend spinning. If I had to pay per token, I'd be much more circumspect about this approach.

simonw · 2025-12-31T14:08:21 1767190101

Yes, that's what I meant. I wasn't sure if you meant classic tab-based autocomplete or Copilot tool-based agent Copilot.

Letting it burn tokens on running tests and refactors (but not letting it merge branches or deploy) is the thing that feels like a huge leap forward to me. We are talking about the same set of capabilities.

timr · 2026-01-01T05:38:04 1767245884

Ah, definitely agent-based copilot. I don't even have the autocomplete stuff turned on anymore, because I found it annoying.

nl · 2025-12-31T07:19:38 1767165578

What do you class a "substantial program"?

For me it is something I can describe in a single casual prompt.

For example I wrote a fully working version of https://tools.nicklothian.com/llm_comparator.html in a single prompt. I refined it and added features with more prompts, but it worked from the start.

timr · 2025-12-31T08:51:17 1767171077

Good question. No strict line, and it's always going to be subjective and a little bit silly to categorize, but when I'm debating this argument I'm thinking: a product that does not exist today (obviously many parts of even a novel product will be completely derivative, and that's fine), with multiple views, controllers, and models, and a non-trivial amount of domain-specific business logic. Likely 50k+ lines of code, but obviously that's very hand-wavy and not how I'd differentiate.

Think: SaaS application that solves some domain specific problem in corporate accounting, versus "in-browser speadsheet", or "first-person shooter video game with AI, multi-player support, editable levels, networking and high-resolution 3D graphics" vs "flappy bird clone".

When you're working on a product of this size, you're probably solving problems like the ones cited by simonw multiple times a week, if not daily.

nl · 2025-12-31T11:54:32 1767182072

I don't think anyone is claiming they can one-shot a 50k line SAAS app.

I think you'd get close on something like Lovable but that's not really one shot either.

nl · 2025-12-31T15:17:24 1767194244

But re-reading your statement you seem to be claiming that there are no 50k SAAS apps that are build even using multi-shot techniques (ie, building a feature at a time).

In that case my Vibe-Prolog project would count: https://github.com/nlothian/Vibe-Prolog/

  - It's 45K of python code
  - It isn't a duplicate of another program (indeed, the reason it isn't finished is because it is stuck between ISO Prolog and SWI Prolog and I need to think about how to resolve this, but I don't know enough Prolog!)
  - Not a *single* line of code is hand written.

Ironically this doesn't really prove that the current frontier models are better because large amounts of code were written with non-frontier models (You can sort of get an idea of what models were used with the labels on https://github.com/nlothian/Vibe-Prolog/pulls?q=is%3Apr+is%3...)

But - importantly - this project is what convinced me that the frontier models are much better than the previous generation. There were numerous times I tried the same thing in a non-Frontier model which couldn't do it, and then I'd try it in Claude, Codex or Gemini and it would succeed.

pianopatrick · 2025-12-30T17:22:46 1767115366

Is there an endpoint for AI improvement? If we can go from functions to classes to substantial programs then it seems like just a few more steps to rewriting whole software products and putting a lot of existing companies out of business.

"AI, I don't like paying for my SAP license, make me a clone with just the features I need".

godelski · 2025-12-31T04:48:34 1767156514

Two things seem to be in contention:

  - Models keep getting better[0]
  - Models since GPT 3 are able to replace junior developers

It's true that both of these can be true at the same time but they are still in contention. We're not seeing agents ready to replace mid level engineersand quite frankly I've yet to see a model actually ready to replace juniors. Possibly low end interns but the major utility of interns is to trial run employment. Frankly it still seems like interns and juniors are advancing faster than these models in the type of skills that matter for companies (not to mention that institutional knowledge is quite valuable). But there's interns that started when GPT 3.5 came out that are seniors now.

The problem is we've been promised that these employees would be replaced[1] any day now, yet that's not happening.

People forget, it is harder to advance when you're already skilled. It's not hard to go from non-programmer to a junior level. Hard to go from junior to senior. And even harder to advance to staff. The difficulty level only increases. This is true for most skills and this is where there's a lot of naivity. We can be advancing faster while the actual capabilities begin to crawl forward rather than leap.

[0] Implication is not just at coding test style questions but also in more general coding development.

[1] Which has another problem in the pipeline. If you don't have junior devs and are unable to replace both mid and seniors by the time that a junior would advance to a senior then you have built a bubble. There's a lot of big bets being made that this will happen yet the evidence is not pointing that way.

pertymcpert · 2025-12-30T04:25:39 1767068739

Opus 4.5 is categorically a much better model from benchmarks and personal experience than Opus 4.1 & Sonnet models. The reason you're seeing a lot of people wax about O4.5 is that it was a real step change in reliable performance. It crossed for me a critical threshold in being able to solve problems by approaching things in systematic ways.

Why do you use the word "chasing" to describe this? I don't understand. Maybe you should try it and compare it to earlier models to see what people mean.

godelski · 2025-12-30T05:41:56 1767073316

  > Why do you use the word "chasing" to describe this?

I think you'll get the answer to this if you read my comment and your response to understand why you didn't address mine.

Btw, I have tried it. It's annoying that people think the problem is not trying. It was getting old when GPT 3.5 came out. Let's update the argument...

v64 · 2025-12-30T02:30:34 1767061834

Looking forward to hearing about how you're using Opus 4.5, from my experience and what I've heard from others, it's been able to overcome many obstacles that previous iterations stumbled on

remich · 2025-12-30T02:54:52 1767063292

Please do. I'm trying to help other devs in my company get more out of agentic coding, and I've noticed that not everyone is defaulting to Opus 4.5 or even Codex 5.2, and I'm not always able to give good examples to them for why they should. It would be great to have a blog post to point to…

indigodaddy · 2025-12-30T02:55:20 1767063320

Can you expound on Opus 4.5 a little? Is it so good that it's basically a superpower now? How does it differ from your previous LLM usage?

pertymcpert · 2025-12-30T04:26:02 1767068762

To repeat my other comment:

> Opus 4.5 is categorically a much better model from benchmarks and personal experience than Opus 4.1 & Sonnet models. The reason you're seeing a lot of people wax about O4.5 is that it was a real step change in reliable performance. It crossed for me a critical threshold in being able to solve problems by approaching things in systematic ways.

dimitri-vs · 2025-12-30T02:31:32 1767061892

Reality is we went from LLMs as chatbots editing a couple files per request with decent results. To running multiple coding agents in parallel to implement major features based on a spec document and some clarifying questions - in a year.

Even IF llms don't get any better there is a mountain of lemons left to squeeze in their current state.

zdragnar · 2025-12-30T01:15:01 1767057301

That would go over on any decently sized team like a lead balloon.

simonw · 2025-12-30T01:45:02 1767059102

As it should, normally, because "we'll rewrite it in React later" used to represent weeks if not months of massively disruptive work. I've seen migration projects like that push on for more than a year!

The new normal isn't like that. Rewrite an existing cleanly implemented Vanilla JavaScript project (with tests) in React the kind of rote task you can throw at a coding agent like Claude Code and come back the next morning and expect most (and occasionally all) of the work to be done.

reactordev · 2025-12-30T03:57:14 1767067034

I’m going to add my perspective here as they seem to all be ganging up on you Simon.

He is right. The game has changed. We can now refactor using an agent and have it done by morning. The cost of architectural mistakes is minimal and if it gets out of hand, you refactor and take a nap anyway.

What’s interesting is now it’s about intent. The prompts and specs you write, the documents you keep that outline your intended solution, and you let the agent go. You do research. Agent does code. I’ve seen this at scale.

zdragnar · 2025-12-30T02:54:01 1767063241

And everyone else's work has to be completely put on hold or thrown away because you did the whole thing all at once on your own.

That's definitely not something that goes over well on anything other than an incredibly trivial project.

pertymcpert · 2025-12-30T04:28:31 1767068911

Why did you jump to the assumption that this:

> The new normal isn't like that. Rewrite an existing cleanly implemented Vanilla JavaScript project (with tests) in React the kind of rote task you can throw at a coding agent like Claude Code and come back the next morning and expect most (and occasionally all) of the work to be done.

... meant that person would do it in a clandestine fashion rather than this be an agreed upon task prior? Is this how you operate?

zdragnar · 2025-12-30T05:26:10 1767072370

My very first sentence:

> And everyone else's work has to be completely put on hold

On a big enough team, getting everyone to a stopping point where they can wait for you to do your big bang refactor to the entire code base- even if it is only a day later- is still really disruptive.

The last time I went through something like this, we did it really carefully, migrating a page at a time from a multi page application to a SPA. Even that required ensuring that whichever page transitioned didn't have other people working on it, let alone the whole code base.

Again, I simply don't buy that you're going to be able to AI your way through such a radical transition on anything other than a trivial application with a small or tiny team.

nl · 2025-12-30T10:20:05 1767090005

> meant that person would do it in a clandestine fashion rather than this be an agreed upon task prior? Is this how you operate?

This doesn't mean this at all

In an AI heavy project it's not unusual to have many speculative refactors kicked off and then you come back to see what it is like.

Wonder you can do a Rust SIMD optimized version of that Numpy code you have? Try it! You don't even need to waste review time on it because you have heavy test coverage and can see if it is worth looking at.

zeroonetwothree · 2025-12-30T05:18:37 1767071917

If you have 100s of devs working on the project it’s not possible to do a full rewrite in one go. So its to about clandestine but rather that there’s just no way to get it done regardless of how much AI superpowers you bring to bear.

Teever · 2025-12-30T05:34:02 1767072842

Let's say I'm mildly convinced by your argument. I've read your blog post that was popular on HN a week or so ago and I've made similar little toy programs with AI that scratch a particular niche.

Do you care to make any concrete predictions on when most developers will embrace this new normal as part of their day to day routine? One year? Five?

And how much of this is just another iteration in the wheel of recarnation[0]? Maybe we're looking at a future where we see return to the monoculture library dense supply chain that we use today but the libraries are made by swarms of AI agents instead and the programmer/user is responsible for guiding other AI agents to create business logic?

[0] https://www.computerhope.com/jargon/w/wor.htm

simonw · 2025-12-30T06:23:47 1767075827

It's really hard to predict how other developers are going to work, especially given how resistant a lot of developers are to fully exploring the new tools.

I do think there's been a bit of a shift in the last two months, with GPT 5.1 and 5.2 Codex and Opus 4.5.

We have models that can reliably follow complex instructions over multiple hour projects now - that's completely new. Those of us at the cutting edge are still coming to terms with the consequences of this (as illustrated by this Karpathy tweet).

I don't trust my predictions myself, but I think the next few months are going to see some big changes in terms of what mainstream developers understand these tools as being capable of.

mkozlows · 2025-12-30T06:58:32 1767077912

"The future is already here, it's just unevenly distributed."

At some companies, most developers already are using it in their day to day. IME, the more senior the developer is, the more likely they are to be heavily using LLMs to write all/most of their code these days. Talking to friends and former coworkers at startups and Big Tech (and my own coworkers, and of course my own experience), this isn't a "someday" thing.

People who work at more conservative companies, the kind that don't already have enterprise Cursor/Anthropic/OpenAI agreements, and are maybe still cautiously evaluating Copilot... maybe not so much.

chairmansteve · 2025-12-30T05:59:34 1767074374

"React is hundreds of thousands of lines of code".

Most of which are irrelevant to my project. It's easier to maintain a few hundred lines of self written code than to carry the react-kitchen-sink around for all eternity.

wanderlust123 · 2025-12-30T03:17:37 1767064657

Not all UIs converge to a React like requirement. For a lot of use cases React is over-engineering but the profession just lacks the balls to use something simpler, like htmx for example.

all_factz · 2025-12-30T04:05:42 1767067542

Sure, and for those cases I’d rather tell the agent to use htmx instead of something hand-rolled.

zeroonetwothree · 2025-12-30T05:19:40 1767071980

Core react is fairly simple, I would have no problem using it for almost everything. The overengineering usually comes at a layer on top.

all_factz · 2025-11-03T05:07:18 1762146438

We need dinners, we don’t need AI

all_factz · 2025-09-17T02:54:18 1758077658

Meh, sort of. Just because LLMs let you output reams of code, doesn’t mean you should use them to do that. As always, you should make the smallest diff that would accomplish your goal. Working this way, LLMs don’t really accelerate my workflow much except for work that’s truly boilerplate and for refactoring. But for the sort of small-ish changes that iterate towards product-market fit, I find I have to spend more time trying to get Claude to do what I want than just writing the code I need by hand.

scuff3d · 2025-09-17T03:40:17 1758080417

Just today it gave me a bunch of deprecated MongoDB calls and completely botched some async Python code. But it's definitely gonna be writing all the code soon. Just six more months...

calmworm · 2025-09-17T03:15:40 1758078940

I’ve certainly found my bogus “5 hour limit” on the pro plan used up multiple times arguing with Claude about the simplest of concepts. So much in fact that I feel it’s by design to push users towards the Max plans… even if not true the fact that I think it at all is a loss for them.

scuff3d · 2025-09-17T03:44:31 1758080671

We already know software companies are intentionally making their products shittier to drive profits. Google making their search worse to increase the number of times people have to search (so they see more ads) is a good example.

There is absolutely no reason to not think AI companies aren't doing the same. Dial in the accuracy so that each tier is only so useful, constantly and subtlety encouraging you to pay a little more for just a few more queries because "the next prompt will make it work, I'm sure this time!"

all_factz · 2025-06-23T03:03:46 1750647826

Because Rooney writes dressed-up romance novels. They’re mediocre. But to the author’s point, at least they’re readable and touch on real emotion, even if I find them a bit trite. Some balance could be struck between Rooney and the more highbrow experiments the author bemoans. Elena Ferrante is a decent example here, and she does get sales. But even then, I do think the author understates the competition reading faces from new forms of media - yes, there are more people, but the amount of distraction has scaled disproportionately to the number of new people. Massively.

all_factz · 2025-06-22T03:45:03 1750563903

Have you been to Israel? I have cousins there. When I was 14 and visited, my 19 year old cousin told me we need to kill all the Arabs because “if we exile them, they will just come back.” Do you really think (a large segment of) Israelis are less crazy than (a large segment of) Iranians?

yonisto · 2025-06-22T17:57:03 1750615023

No. People are crazy everywhere. That is not the same as the actual leaders of the country. The one that are calling the shots making the same claims for 46 years.

Now, I don't know if you noticed, your cousins while they are not kind to Arabs (which if you had Arab cousins you would have noticed that they are not very kind to Jews), have nothing whatsoever with Iran, no more than they have anything with Napal.

1500km away!

all_factz · 2025-06-23T11:28:31 1750678111

That’s a little simplistic. Iranians feel, somewhat justifiably, that they and the Arab world have been pushed around by the West for over 100 years. The Jihadism we bemoan today didn’t arise in a vacuum - it is at least partially a reaction to Western interference in Middle Eastern affairs (recall how the US deposed a democratically elected Iranian leader). Israel is one such example of this Western interference, and while I obviously have the utmost sympathy for Israelis - having family there - I do think not enough Westerners are willing to see this from the Arab/Iranian PoV. There’s a reason they dislike us, and it’s not just that they’re fanatics. Negotiation would be more fruitful if we didn’t typecast our enemies as unreasoning fundamentalists.

all_factz · 2025-06-22T03:38:11 1750563491

Iran has shown itself a rational actor time and time again by not escalating against continued provocation by Israel and the US, knowing that to do so would be to enter a conflict it can’t win. That’s not the behavior of an irrational actor who’s willing to fight whatever the cost, even total annihilation (which would be what happened if Iran nuked the US/Israel).

They may be religious fanatics, but they’re not idiots.

margalabargala · 2025-06-22T04:29:12 1750566552

Iran funded Hamas who did October 7th. That is the original escalation that kicked all this off. The region was (relatively) quiet until then.

all_factz · 2025-06-22T04:36:11 1750566971

October 7th was a reaction to Trump’s “Abraham Accords” which benefitted Sunni countries at the expense of Iran.

margalabargala · 2025-06-22T13:35:11 1750599311

And also a gigantic, gigantic escalation.

"You made a deal that disadvantages us so we're going to rape and murder a bunch of teenagers and kidnap people."

all_factz · 2025-06-23T11:22:46 1750677766

It’s not a justification, obviously. But it is a (partial) explanation. Israel wanted to keep sweeping the Palestinian issue under the rug, and Hamas and its sponsors were not going to allow that.

margalabargala · 2025-06-23T14:09:46 1750687786

You claimed that Iran "time and again does not escalate against provocations by the US and Israel".

Sure, it's an explanation. Everything shitty Israel or Hamas has done since 1950 has an "explanation" of "that other group did something to me".

FergusArgyll · 2025-06-22T10:54:10 1750589650

Iran FM saying no to negotiations on Friday was insane

https://www.yahoo.com/news/irans-foreign-minister-says-no-10...

all_factz · on Dec 8, 2023

So much this. It’s hard not to lose the forest for the trees — we are craftspeople after all — but at the end of the day the overall structure of a program is so much important than whether there’s a bit of duplication here or there. Better to let the structure emerge and then reduce duplication instead of trying to guess the right structure up front. And yeah I’m still not convinced SOLID is real (but I’m also not convinced classes are useful much of the time, for that matter).

zelphirkalt · on Dec 8, 2023

Letting the structure emerge requires people thinking in depth about the underlying principles of what the code does or should do.

As for classes: They are merely a construct in many languages, that people have come up with for organizing code and in my opinion a very debatable one. Some newer languages don't even deal in classes at all (Rust for example) and with good reason. If one says we need classes for having objects—No we don't. And objects are a concept to manage state over the lifetime of what the object represents, so that might be a worthy concept, but a class? I mostly find classes being used as a kind of modules, not actually doing anything but grouping functionality, that could simply be expressed by writing ... functions ... in a module, a construct for grouping that functionality.

I think what many people actually want is modularity, which in contrast to classes is a concept, that truly seems to be well accepted and almost every language tries to offer it in some way or another. It is just that many people don't realize, that this is what they are chasing after, when they write class after class in some mainstream language, that possibly does not even provide a good module system.

GTP · on Dec 8, 2023

> Some newer languages don't even deal in classes at all (Rust for example)

When you first write a struct, and then you below write an implement block for that struct containing functions that can only be applied to that struct, it really looks like a class to me, it just has a syntax that differs from what we're used to in other languages. Why wouldn't you call that a class?

zelphirkalt · on Dec 8, 2023

The idea is, that it decouples state from behavior, while a class tries to group that together. Other people can implement traits later for your struct. (I think Clojure has something similar.) They do not need to subclass your struct and in fact cannot even. This design encourages and kind of forces composition over inheritance.

I would not name it a class, because I don't want people to fall back into thinking: "Ah I know! Inheritance!".

GTP · on Dec 8, 2023

Well, having classes doesn't mean that you will necessarily use inheritance. There are programmers (ab)using it a lot, but for me, like for many others, classes are primarily a way to organize code: they provide a convinient way of representing something and specify functions that only make sense in the context of that something, while ensuring that you don't accidentally re-use those function for something else or lose track of which functions are meant for which component. They also provide a way to make collaboration easier, as you can agree on classes' interfaces and then each one goes on to implement it's own classes without having to worry about implementation details of other classes. It is true that you usually also have inheritance with classes, but I'm unsure if having it is a requirement to call something a class. IIRC from a theory perspective, classes are just an abstraction of some concepts, and the fact that a class' instance is called an object reflects this.

zogrodea · on Dec 8, 2023

I think the person you're replying to tried to address that point that classes are primarily a way to organise code when other possibly equally good or better options exist like modules. An F# module might (for example) look like below.

module Hello = let say() = "hi" // returns the string "hi"

There are mechanisms to encapsulate implementation details (private functions), to have multiple modules for different "domains" and specifying public contracts.

A class seems to imply more than that: each class specifies how to create an object with a constructor (where an object is something with the class's methods except modifying some state only owned by and accessible by the object itself).

GTP · on Dec 9, 2023

But in Rust you've got constructors as well. The only thing that really seems to be missing is inheritance. But I understand that one might say that classes without the possibility of using inheritance don't look fully right.

steveklabnik · on Dec 9, 2023

Rust does not have constructors.

GTP · on Dec 10, 2023

I would call this [0] a constructor, why wouldn't you?

[0] https://doc.rust-lang.org/nomicon/constructors.html

steveklabnik · on Dec 10, 2023

Wikipedia defines it as

> a constructor is a special type of function called to create an object.

Now, I don't think Wikipedia is always the authority on everything, but this is how I view them as well. A constructor is kinda like a callback that the language runtime invokes when an something is created. For example, in C++:

  class Foo {
   public:
    Foo() {
      // your code goes here
    }
  };

  int main() {
    Foo a; // constructor invoked here
  }

(please excuse formatting, there is no unified C++ style so I just picked one)

or in Ruby:

  class Foo
    def initialize
      # your code goes here
    end
  end

  Foo.new # constructor invoked here

These "constructors" in Rust don't work like that at all. They are not special member functions that are invoked when something is created; they are the syntax to create something.

GTP · on Dec 10, 2023

I think this is where things get muddy. On one hand, I think that there is still a function call going on under the hood, even if the syntax hides that. On the other, if you need some preprocessing before assigning the values to your struct's fields, you can still define a special member function that does whatever you need to figure out what the correct values are. In my view, what Rust is actually missing is the default constructor, because it will not initialize variables to some default value for you.

steveklabnik · on Dec 10, 2023

If you define a function that does some preprocessing before creating an instance in Rust, it is not a special member function that the language understands. It’s just a regular old function that happens to return an instance.

I do agree that it can be useful when discussing code to refer to these style functions with a more specific term, like a constructor, but that’s a colloquialism, and when discussing language features, being precise is important.

zozbot234 · on Dec 8, 2023

Classes are about data abstraction and encapsulation, which have nothing to do with implementation inheritance. They're about providing an interface that preserves any required invariants and does not depend directly on how the data is represented. A "structure" that either preserves useful invariants or is intended to admit of multiple possible representations that nonetheless expose the same outward behavior is effectively a class, whether you call it one or not.

zelphirkalt · on Dec 8, 2023

I think the discussion of what to call that is a bit pointless. For some people through their Java/C++/whatever-tinted glasses, these things will be classes. To others they might be called something else. You personally call them "classes". I personally do not. Rust the language does not. Lots of people behind the development of the language thought that structs and traits are better names.

I appreciate the programming language design behind them and hope, that Rust will not devolve into an ecosystem, where everyone thinks that they must put everything into classes or similar, needlessly maintaining state therein, requiring users to mutate that state through accessors and whatnot, when simply a couple of functions (and I mean strictly functions, not procedures) would have done the job.

I never stated, that I personally think classes necessarily mean inheritance. But guess who thinks so. Lots and lots of developers working with mainstream languages, where frequently inheritance is made use of, in combination with what those languages call a "class". That is why I am saying, that I don't want those people to fall back into their previous thinking and would not want to call them classes. It gives many many people the wrong ideas.

What other people call it is their own choice. I am merely stating my own preference here and probably the preference of the language designers, whom I mostly deem to be quite a lot more competent in drawing the distinction than myself.

zozbot234 · on Dec 8, 2023

> requiring users to mutate that state through accessors

There are plenty of cases where this makes sense, such as when working with sub-word data (bitfields), which is common in the embedded domain and often found as part of efficient code more generally. In fact, it may be more rare to have actual structs where one definitely wants to provide actual access (e.g. via pointers/refs) to the underlying data, and thus cannot just rely on getters/setters.

all_factz · on Dec 9, 2023

> Letting the structure emerge requires people thinking in depth about the underlying principles of what the code does or should do.

Right, yes, but those principles are often still very much in flux in the early days of a feature. Once a feature is more mature, it’s easier to confidently say what the code should do, and so that becomes a good time to refactor. Early on in the development lifecycle I think it’s rarely a good idea to worry about code duplication, underabstraction, etc.

And yes I agree with you that classes are an organizational concept with parallels in functional languages. Modularity is very important, but as you say there’s no reason that modularity implies classes. Sometimes I find classes to be ergonomic, and when they are using them makes sense, but plenty of other times a struct will do, as long as there’s some type of module system to keep different things different.

rewmie · on Dec 9, 2023

> Some newer languages don't even deal in classes at all (Rust for example) and with good reason. If one says we need classes for having objects—No we don't.

That's specious reasoning at best.

A class basically means a way to specify types that track specific states and behavior. Afterwards this materializes in other traits like interfaces and information-hiding.

Don't tell me Rust does not support those.

Also, C++ precedes Rust and it supports free functions and objects without member functions from day one. Rust is hardly relevant or unique in this regard. What Rust supports or not is not progress, nor is the definition of progress whatever Rust supports.

> And objects are a concept to manage state over the lifetime of what the object represents, so that might be a worthy concept, but a class?

You're showing some ignorance here. Classes and objects are entirely different concepts. An object is an instance of a class. A class is a blueprint of an object. This is programming 101.