More

rco8786 · 2026-02-26T23:43:59 1772149439

> My impression of Block was that it was mostly a one-trick pony (okay, two if you include CashApp) with a bunch of side initiatives that never seemed to pan out,

I worked at Block for ~6.5 years up until 2024. This is mostly correct.

They were the first to market for portable CC readers, and segued that into "high tech" POS systems which, to be fair, were significantly better than the available alternatives at the time. But flashy hardware design and iPads isn't really a moat, and the company never developed a great muscle for launching other initiatives. The strategy was "omnibus" - trying to do everything for everyone and win on the ecosystem efficiencies...but when none of your products are particularly standout it's hard to get and keep customers.

CashApp being the notable exception, because they gave the founder carte blanche. It was effectively 2 different companies operating under the $SQ ticker. They even had their own interview process for internal transfers. Although ironically the engineering standards on the CashApp side of the fence were significantly sloppier than on the Square side...to the point where I stopped using CashApp and stopped recommending it to friends once I transferred to that org and saw how the sausage was made.

paxys · 2026-02-27T00:04:29 1772150669

Exactly. Square was the first great checkout system, but now a decade and a half later every other system is good enough that retailers aren't going to pay extra for a flashier app.

tshaddox · 2026-02-27T14:45:57 1772203557

Over the last 10 years or so in SF and LA, I’ve seen so many countless POS systems at restaurants and small businesses that it’s difficult to believe that Square is anything more than 1 player in an enormous field.

And businesses I frequent over many years seem to change their POS systems often. I’ve always assumed that every year there are a bunch of new startups using their VC funds to give away free iPad minis. When the cheap hardware breaks or the software company goes under, there’s always a new one to take its place.

izacus · 2026-02-27T16:22:03 1772209323

What's wrong with being 1 player in enormous field? Does everything need to be a monopoly in US nowadays?

raw_anon_1111 · 2026-02-27T00:48:32 1772153312

And before people like my barber would have had a square reader. With NFC in modern phones, they just use that

jrjeksjd8d · 2026-02-27T15:01:10 1772204470

Square is a great option for selling crafts at a market once a month. I tried to use it for a proper multi-channel retail store and it immediately fell apart:

- the e-commerce integration (Weebly?) is very limited and the resulting sites are dog slow

- the POS itself and the backend don't work when you have hundreds of SKUs and many variants

- there's very little customization or support

My business wasn't huge but we were doing ~300k revenue annually online and in-store. We started on Square, tried Lightspeed (also garbage) and finally ended up on Shopify (best of a bad lot).

Despite making noise about "supporting small business" Shopify makes most of their money from e-commerce for giant brands. They've tried to juice returns from small customers with merchant cash advances but my sense is they make more doing professional services for big e-commerce brands

mattmaroon · 2026-02-27T01:51:19 1772157079

It’s not extra and their hardware is still far better than the competition. Square is still awesome in the small business PoS space. Their lead has not shrunk.

paxys · 2026-02-27T03:19:31 1772162371

Toast has already caught up in market share, and dominates the restaurant industry. Square's numbers have been stagnant for many years.

And more importantly, the entire premise when Square launched was that app-based "cloud" PoS systems would replace all traditional cash registers. Except now 15 years later that simply hasn't happened. Existing players in the space all caught up and shipped chip and NFC readers to their retailers, and that's all that was needed.

ryukoposting · 2026-02-27T07:35:48 1772177748

The ubiquity of NFC has made specialized hardware irrelevant for entire industries. It's set dressing for small businesses. The ruggedized enclosures and swiveling touchscreens are cute, but that's not a moat.

Their lead absolutely has shrunk. In the mid-late 2010s it was either Square, or a bevy of shovelware Windows POS systems loosely stitched together with a tablet for rewards and maybe grubhub. Clover and Toast are both regular sights in that space now.

rco8786 · 2026-02-27T12:23:24 1772195004

> Their lead has not shrunk

It has though, by a lot. Toast in particular has eaten Square's lunch in the restaurant industry and now they're expanding to retail. Even NCR has caught up, along with a long tail of newer competitors eating away at market share.

There was a window of time where Square was the default choice for small biz POS and that is most definitely not the case anymore.

Marsymars · 2026-02-27T02:28:06 1772159286

You might think Square has better hardware/software, but they absolutely are extra cost-wise for small businesses if you compare them to e.g. Helcim.

simonw · 2026-02-27T00:27:25 1772152045

Did any of the blockchain initiatives ever go anywhere? I understood that's why they renamed the company to Block, but did that end up a similar rebrand to Facebook -> Meta?

ursuscamp · 2026-02-27T00:34:46 1772152486

They are heavily invested in Bitcoin and still offer and improve their Bitcoin services. It’s not really “blockchain.” They’re not a crypto company. They are ideologically dedicated to Bitcoin.

daxfohl · 2026-02-27T00:58:30 1772153910

I don't think so. I know a couple people that worked in TBD (the bitcoin org) and everyone said it was directionless. Eventually the CTO ~abandoned that org and took on that Goose AI project.

Scoundreller · 2026-02-27T07:53:49 1772178829

The bought $170m of bitcoin at $50k a pop when their stock was $250, now it’s $67k and their stock is $67 (in after hours trading), so I guess it went pretty far in that respect.

rco8786 · 2026-02-27T02:31:46 1772159506

The only thing it served was to distract Jack from building real products.

rco8786 · 2026-02-25T12:05:15 1772021115

Yes, this. It's unfortunate that anthropic dropped this and it's also exactly how the system is supposed to work. Companies don't regulate themselves, the government regulates the companies.

Now, you may notice that the government is also choosing not to regulate these companies...which is another matter altogether.

ozmodiar · 2026-02-25T12:48:03 1772023683

It's so much worse than that. The government actively encourages a lack of business ethics. Heck, it started the term with a crypto rug pull. Money continues to funnel upward to all the worst players, and watchdogs are being targeted and destroyed. Even if you get new people in power, you're going to find the upper echelons completely full of outlandishly wealthy, morally bankrupt individuals that are very politically active. And now they have access to all of our communications and an AI to sift through it looking for dissent (or to spark its own). I guess this is the end game of "move fast and break things." The situation was never good, but it continues to get worse at an alarming rate.

mschuster91 · 2026-02-25T13:23:04 1772025784

> Heck, it started the term with a crypto rug pull

If you ask me... that wasn't a rug pull, at least not in the intent - it more was a way for foreign actors to funnel money directly to Trump and his family without any trace.

lupire · 2026-02-25T14:17:38 1772029058

Cryptocurrency is the most traceable money in the world. Cryptocurrency is for implusible deniability, not untraceability.

bumby · 2026-02-25T13:38:23 1772026703

There is plenty of precedent that companies are expected to regulate themselves. If you are in the US and perform an engineering role without a license or without working under someone with a license, it’s because of an “industrial exemption.” The premise is that companies have enough standards and processes in place to mitigate that risk.

However, there is also plenty of evidence that this setup may no longer work. It seems like the norm has shifted, where companies no longer think it’s their duty to manage risk, only to chase $$$. When coupled with anti-government rhetoric, it effectively socializes the risk to the public but not the profits.

rco8786 · 2026-02-25T22:27:37 1772058457

The entire system you just described is government regulation.

> without a license

A government issued license.

> it’s because of an “industrial exemption.”

A government allowed exemption.

Etc.

Agree with your second paragraph.

bumby · 2026-02-26T04:00:40 1772078440

Your point isn’t wrong if you take an extreme libertarian view of things, but it’s not quite how it’s usually interpreted colloquially.

“When the people who make the rules say there are no rules, that means they’re making rules” is an oddly circular take for most people.

lupire · 2026-02-25T14:22:59 1772029379

Am exemption from PE stamping (misguided as it maybe) does not mean unregulated. There are still regulations on designs and builds.

bumby · 2026-02-25T14:28:56 1772029736

True to an extent, but those regulations tend to downstream of bad things happening.

The exemption means “self-regulation” which is what the OP was speaking to. There are industrial standards, for example, but that’s not a governing body. You can create a design that goes against a standard and there’s nothing to stop you from releasing it to the public. The same can’t be said for those who require licenses and stamped designs. There’s also no explicit individual ethics codes in exempted industries. In contrast, a stamped design is saying the design adheres to good standards.

Apropos to HN, somebody could write safety critical software with emergency braking delays because of nuisance alarms and put it on the street without any licensed engineer taking responsibility for it. The governance only comes after an accident and an NTHSB investigation.

bigbadfeline · 2026-02-25T18:29:48 1772044188

> anthropic dropped this and it's also exactly how the system is supposed to work. Companies don't regulate themselves, the government regulates the companies.

In this case, it's exactly how it's NOT supposed to work because there's no government regulation concerning the issue. It would be bad looks to have regulation that mandates LESS safety thus the issue was forced on commercial grounds.

I called it yesterday, there was never any doubt in my mind how this would end, and it did in less than 24 hours:

https://news.ycombinator.com/item?id=47144609

rco8786 · 2026-02-25T22:28:24 1772058504

> because there's no government regulation concerning the issue

Yea, see the next sentence in my post :-/

rco8786 · 2026-02-25T11:57:57 1772020677

Not really. You still have to be an accredited investor AND financially savvy enough to have the awareness of what syndicate deals are and how to find them and participate.

walletdrainer · 2026-02-25T13:07:15 1772024835

> You still have to be an accredited investor

Which means that you have to be able to repeat after me “I am an accredited investor”, that’s the actual full list of requirements.

bix6 · 2026-02-25T14:01:48 1772028108

No, the actual full list of requirements is on the SEC website site.

inemesitaffia · 2026-02-25T20:55:08 1772052908

I think they are trying to point out the seller may not be required to verify. Only ask.

Like the age check on logged out YouTube videos

walletdrainer · 2026-02-25T22:10:44 1772057444

No, those are not actual requirements.

rco8786 · 2026-02-25T22:29:31 1772058571

Legally speaking that is not correct. Maybe in practice you can do that, but it opens you up to very real risk.

To your bog standard human being, trading Stripe stock has significant barriers vs trading a public stock.

walletdrainer · 2026-02-27T19:18:47 1772219927

What “very real risk”?

Not legal risk, certainly.

“Risk” in economic terms? Yeah that’s the whole purpose of claiming to be an accredited investor.

rco8786 · 2026-02-22T16:27:26 1771777646

Stripe has somewhere around 3000-3500 engineers on staff, so it's less than one PR a week spread across the org.

ericyd · 2026-02-22T17:17:01 1771780621

Thanks, I realized after I wrote it that the size of their staff was really the variable I was missing. Agreed that's not a remarkably high rate with such a large engineering org.

rco8786 · 2026-02-20T20:09:39 1771618179

I just logged in to mine to see, I also can't remember the last time I looked at my news feed. My experience isn't quite as bad as OPs, but certainly plenty of AI slop and lots and lots of accounts that I don't follow and have never heard of.

rco8786 · 2026-02-20T12:36:58 1771591018

I think we're making a mistake by shoving all of this into the cloud rather than building tooling around local agents (worktrees, containers, as mentioned as "difficult" in the post). I think as an industry we just reach for cloud like our predecessors reached for IBM, without critical thought about what's actually the right tool for the job.

If you can manage docker containers in a cloud, you can manage them on your local. Plus you get direct access to your own containers, local filesystems and persistence, locally running processes, quick access for making environmental tweaks or manual changes in tandem with your agents, etc. Not to mention the cost savings.

cryptonector · 2026-02-20T16:23:02 1771604582

The thing is that startups often don't have the time or capital to build a data center even though public cloud is just more expensive. If you're bootstrapping a business then it makes sense. My advice would be to always use only those features of the public cloud that you can also use on your private cloud, such as Kubernetes.

martinald · 2026-02-20T19:33:21 1771616001

How do people think that's the only two options (AWS/cloud or build a datacenter)? It astounds me.

There's _so_ many providers of 'bare metal' dedicated servers - Hetzner and OVH come up a lot, but _before_ AWS there was ev1servers (anyone remember them?).

andersmurphy · 2026-02-20T20:16:06 1771618566

Because, a lot of money went into cloud marketing to convince us those are the only two options.

Tech is for all intents and purposes a planed economy (we are in the middle of the LLM five year plan comrade).

rco8786 · 2026-02-20T17:32:38 1771608758

I am confused. I am not saying anything about building a datacenter

b40d-48b2-979e · 2026-02-20T12:53:12 1771591992

You also get all the risk of exposing your network and the cost of maintenance for your own datacenter.

rco8786 · 2026-02-20T12:26:25 1771590385

The regret was about the cost of the premium support

rco8786 · 2026-02-20T12:20:13 1771590013

I'm sure there are lots of Stripe engineers that cruise the comments here. Anyone care to provide some color on how this is actually working? It's not a secret that agents can produce tons and tons of code on their own. But is this code being shipped? Maintained? Reviewed?

etothet · 2026-02-20T13:16:48 1771593408

Part 1 is linked in this article and explains a bit: “Minions are Stripe’s homegrown coding agents. They’re fully unattended and built to one-shot tasks. Over a thousand pull requests merged each week at Stripe are completely minion-produced, and while they’re human-reviewed, they contain no human-written code.”

I could be wrong, but my educated guess is that, like many companies, they have many low hanging fruit tasks that would never make it into a sprint or even somewhat larger tasks that are straight forward to define and implement in isolation.

dakolli · 2026-02-20T12:36:21 1771590981

The few guys who they haven't laid off are too busy reviewing and being overworked, doing the work of 10 to scroll HN. Gotta get their boss another boat, AI is so awesome!

malfist · 2026-02-20T13:55:18 1771595718

Stripe hasn't had a layoff in a good while. Stripe is hiring like mad and is planning on growing engineering significantly. Your comment isn't grounded in reality

2026-02-20T14:33:18 1771597998

[dead]

snayan · 2026-02-20T16:07:14 1771603634

Seems like they've been pretty successful with this method? Why do you think it's bullshit?

rileymichael · 2026-02-20T21:44:52 1771623892

successful how? the only metric i see is # of pull requests which means nothing. hell, $dayJob has hundreds of PRs generated weekly from renovate, i18n integrations, etc. with no LLM in the mix!

rco8786 · 2026-02-19T11:33:02 1771500782

> Has anyone seen a production system that actually does claim-level verification before generation?

"Claim level" no, but search engines have been scoring sources on reliability and authority for decades now.

amabito · 2026-02-19T11:44:23 1771501463

Right — search engines have long had authority scoring, link graphs, freshness signals, etc.

The interesting gap is that retrieval systems used in LLM pipelines often don't inherit those signals in a structured way. They fetch documents, but the model sees text, not provenance metadata or confidence scores.

So even if the ranking system “knows” a source is weak, that signal doesn’t necessarily survive into generation.

Maybe the harder problem isn’t retrieval, but how to propagate source trust signals all the way into the claim itself.

rco8786 · 2026-02-16T13:15:39 1771247739

> it's about catching when it goes off the rails before it makes a mess

The latest "meta" in AI programming appears to be agent teams (or swarms or clusters or whatever) that are designed to run for long periods of time autonomously.

Through that lens, these changes make more sense. They're not designing UX for a human sitting there watching the agent work. They're designing for horizontally scaling agents that work in uninterrupted stretches where the only thing that matters is the final output, not the steps it took to get there.

That said, I agree with you in the sense that the "going off the rails" problem is very much not solved even on the latest models. It's not clear to me how we can trust a team of AI agents working autonomously to actually build the right thing.

g947o · 2026-02-16T13:26:21 1771248381

None of those wild experiments are running on a "real", existing codebase that is more than 6 months old. The thing they don't talk about is that nobody outside these AI companies wants to vibe code with a 10 year old codebase with 2000 enterprise customers.

As you as you start to work with a codebase that you care about and need to seriously maintain, you'll see what a mess these agents make.

GoatInGrey · 2026-02-16T17:20:10 1771262410

Even on codebases within the half-year age group, these LLMs often do perform nasty (read: ungodly verbose) implementations that become a maintainability nightmare. Even for the LLMs that wrote it all in the first place. I know this because we've had a steady trickle of clients and prospects expressing "challenges around maintainability and scalability" as they move toward "production readiness". Of course, asking if we can implement "better performing coding agents". As if improved harnessing or similar guardrails can solve what is in my view, a deeper problem.

The practical and opportunistic response is too tell them "Tough cookies" and watch the problems steadily compound into more lucrative revenue opportunities for us. I really have no remorse for these people. Because half of them were explicitly warned against this approach upfront but were psychologically incapable of adjusting expectations or delaying LLM deployment until the technology proved itself. If you've ever had your professional opinion dismissed by the same people regarding you as the SME, you understand my pain.

I suppose I'm just venting now. While we are now extracting money from the dumbassery, the client entitlement and management of their emotions that often comes with putting out these fires never makes for a good time.

buschleague · 2026-02-16T20:12:37 1771272757

This is exactly why enforcement needs to be architectural. The "challenges around maintainability and scalability" your clients hit exist because their AI workflows had zero structural constraints. The output quality problem isn't the model, it's the lack of workflow infrastructure around it.

datsci_est_2015 · 2026-02-16T20:44:46 1771274686

Is this not just “build a better prompt” in more words?

At what point do we realize that the best way to prompt is with formal language? I.e. a programming language?

semiquaver · 2026-02-16T21:01:23 1771275683

No, the suite of linters, test suite and documentation in your codebase cannot be equated to “a better prompt” except in the sense that all feedback of any kind is part of what the model uses to make decisions about how to act.

datsci_est_2015 · 2026-02-16T21:25:34 1771277134

A properly set up and maintained codebase is the core duty of a software engineer. Sounds like the great-grandparent comment’s client needed a software engineer.

oblio · 2026-02-16T23:34:18 1771284858

What if LLMs, at the end of the day are machines, so for now generally dumber than humans and the best they can provide are at most statistically median implementantions (and if 80% of code out there is crap, the median will be low)?

Now that's a scary thought that basically goes against "1 trillion dollars can't be wrong".

Now, LLMs are probably great range extenders, but they're not wonder weapons.

lossyalgo · 2026-02-17T00:56:47 1771289807

Also who is to say what is actually crap? Writing great code is completely dependent on context. An AI could exclusively be trained on the most beautiful and clean code in the world, yet if it chooses the wrong paradigm in the wrong context, it doesn't matter how beautiful that code is - it's still gonna be totally broken code.

krastanov · 2026-02-16T14:05:21 1771250721

I maintain serious code bases and I use LLM agents (and agent teams) plenty -- I just happen to review the code they write, I demand they write the code in a reviewable way, and use them mostly for menial tasks that are otherwise unpleasant timesinks I have to do myself. There are many people like me, that just quietly use these tools to automate the boring chores of dealing with mature production code bases. We are quiet because this is boring day-to-day work.

E.g. I use these tools to clean up or reorganize old tests (with coverage and diff viewers checking of things I might miss), update documentation with cross links (with documentation linters checking for errors I miss), convert tests into benchmarks running as part of CI, make log file visualizers, and many more.

These tools are amazing for dealing with the long tail of boring issues that you never get to, and when used in this fashion they actually abruptly increase the quality of the codebase.

g947o · 2026-02-16T15:23:39 1771255419

It's not called vibe coding then.

jmalicki · 2026-02-16T16:17:01 1771258621

Oh you made vibe coding work? Well then it's not vibe coding.

But any time someone mentions using AI without proof of success? Vibe coding sucks.

GoatInGrey · 2026-02-16T17:25:12 1771262712

No, what the other commenter described is narrowly scoped delegation to LLMs paired with manual review (which sounds dreadfully soul-sucking to me), not wholesale "write feature X, write the unit tests, and review the implementation for me". The latter is vibe-coding.

krastanov · 2026-02-16T17:53:12 1771264392

Reviewing a quick translation of a test to a benchmark (or another menial coding tasks) is way less soul-sucking than doing the menial coding by yourself. Boring soul-sucking tasks are an important thankless part of OSS maintenance.

I concur it is different from what you call vibecoding.

unshavedyak · 2026-02-16T17:50:24 1771264224

Sidenote, i do that frequently. I also do varying levels of review, ie more/less vibe[1]. It is soul sucking to me.

Despite being soul sucking, I do it because A: It lets me achieve goals despite lacking energy/time for projects that don't require the level of commitment or care that i provide professionally. B: it reduces how much RSI i experience. Typing is a serious concern for me these days.

To mitigate the soul sucking i've been side projecting better review tools. Which frankly i could use for work anyway, as reviewing PRs from humans could be better too. Also inline with review tools, i think a lot of soul sucking is having to provide specificity, so i hope to be able to integrate LLMs into the review tool and speak more naturally to it. Eg i belive some IDEs (vscode? no idea) can let Claude/etc see the cursor, so you can say "this code looks incorrect" without needing to be extremely specific. A suite of tooling that improves this code sharing to Claude/etc would also reduce the inane specificity that seems to be required to make LLMs even remotely reliable for me.

[1]: though we don't seem to have a term for varying amounts of vibe. Some people consider vibe to be 100% complete ignorance of the architecture/code being built. In which case imo nothing i do is vibe, which is absurd to me but i digress.

EnPissant · 2026-02-16T17:07:06 1771261626

> According to Karpathy, vibe coding typically involves accepting AI-generated code without closely reviewing its internal structure, instead relying on results and follow-up prompts to guide changes.

What you are doing is by definition not vibe coding.

lukeschlather · 2026-02-16T17:17:27 1771262247

It's not vibe coding if you personally review all the diffs for correctness.

peyton · 2026-02-16T14:17:19 1771251439

Yeah esp. the latest iterations are great for stuff like “find and fix all the battery drainers.” Tests pass, everyone’s happy.

hp197 · 2026-02-16T16:03:17 1771257797

(rhetorical question) You work at Apple? :p

JPKab · 2026-02-16T14:18:29 1771251509

I work at a company with approximately $1 million in revenue per engineer and multiple 10+ year old codebases.

We use agents very aggressively, combined with beads, tons of tests, etc.

You treat them like any developer, and review the code in PRs, provide feedback, have the agents act, and merge when it's good.

We have gained tremendous velocity and have been able to tackle far more out of the backlog that we'd been forced to keep in the icebox before.

This idea of setting the bar at "agents work without code reviews" is nuts.

otabdeveloper4 · 2026-02-16T19:53:11 1771271591

> We have gained tremendous velocity and have been able to tackle far more out of the backlog that we'd been forced to keep in the icebox before.

Source? Proofs? It's not the first, second or even third round on this rodeo.

In other words, notto disu shittu agen.

groundzeros2015 · 2026-02-16T14:33:43 1771252423

Why are you using experience and authoritative framing about a technology we’ve been using for less than 6 months?

kasey_junk · 2026-02-16T15:12:55 1771254775

The person they are responding with dictated an authoritative framing that isn’t true.

I know people have emotional responses to this, but if you think people aren’t effectively using agents to ship code in lots of domains, including existing legacy code bases, you are incorrect.

Do we know exactly how to do that well, of course not, we still fruitlessly argue about how humans should write software. But there is a growing body of techniques on how to do agent first development, and a lot of those techniques are naturally converging because they work.

groundzeros2015 · 2026-02-16T15:17:32 1771255052

I think programming effectiveness is inherently tied to the useful life of software, and we will need to see that play out.

This is not to suggest that AI tools do not have value but that “I just have agents writing code and it works great!” Has yet to hit its test.

garciasn · 2026-02-16T15:57:13 1771257433

The views I see often shared here are typical of those in the trenches of the tech industry: conservative.

I get it; I do. It's rapidly challenging the paradigm that we've setup over the years in a way that it's incredibly jarring, but this is going to be our new reality or you're going to be left behind in MOST industries; highly regulated industries are a different beast.

So; instead of just out-of-hand dismissing this, figure out the best ways to integrate agents into your and your teams'/companies' workstreams. It will accelerate the work and change your role from what it is today to something different; something that takes time and experience to work with.

benterix · 2026-02-16T16:20:23 1771258823

> I get it; I do. It's rapidly challenging the paradigm that we've setup over the years in a way that it's incredibly jarring,

But it's not the argument. The argument is that these tools provide lower-quality output and checking this output often takes more time than doing this work oneself. It's not that "we're conservative and afraid of changes", heck, you're talking to a crowd that used to celebrate a new JS framework every week!

There is a push to accept lower quality and to treat it as a new normal, and people who appreciate high-quality architecture and code express their concern.

joquarky · 2026-02-16T23:41:31 1771285291

"Find any inconsistencies that should be addressed in this codebase according to DRY and related best practices"

This doesn't hurt to try and will give valuable and detailed feedback much more quickly than even an experienced developer seeing the project for the first time.

benterix · 2026-02-17T09:41:17 1771321277

These kinds of instructions are the main added value of LLMs and I use them every day. Even though 30%-60% the output is wrong/irrelevant, the rest is helpful enough. After the human reviews it, the overall quality of the codebase increases, not decreases. This is on the opposite end of the spectrum when compared to agentic coding, though.

thesz · 2026-02-16T16:19:53 1771258793

  > It will accelerate the work and change your role from what it is today to something different;

We yet to see if different is good.

My short experience with LLM reviewing my code is that LLM's output is overly explanatory and it slows me down.

  > something that takes time and experience to work with.

So you invite us to participate in sunken cost fallacy.

joquarky · 2026-02-16T23:44:05 1771285445

Tell it to summarize?

thesz · 2026-02-17T15:49:18 1771343358

I cannot because these reviews are comments in Github PRs. I have to read them.

groundzeros2015 · 2026-02-16T16:09:08 1771258148

I don’t doubt that companies are willing to try low quality things. They play with these processes all the time. Maybe the whole industry will try it.

I’m available for consulting when you need something done correctly.

democracy · 2026-02-16T19:50:45 1771271445

Also people need to be more specific about technologies/tasks they do. Otherwise it's apples to oranges.

JPKab · 2026-02-16T16:04:51 1771257891

6 months?

I've been using LLMs to augment development since early December 2023. I've expanded the scope and complexity of the changes made since then as the models grew. Before beads existed, I used a folder of markdown files for externalized memory.

Just because you were late to the party doesn't mean all of us were.

rstuart4133 · 2026-02-17T00:45:55 1771289155

> Just because you were late to the party doesn't mean all of us were.

It wasn't a party I liked back in 2023. I'm just repeating the same stuff I see said over and over again here, but there has been a step change with Opus 4.5.

You can still it in action now because the other models are still where Opus was at a while ago. I recently needed to make small change to script I was using. It is a tiny (50 line) script written with the help of AI's ages ago, but was subtly wrong in so many ways. It's now become clear neither the AI's (I used several and cross checked) nor myself had a clue about what we were dealing with. The current "seems to work" version was created after much blood caused by misunderstandings was spilt, exposing bugs that had to be fixed.

I asked Claude 4.6 to fix yet another misunderstanding, and the result was a patch changing the minimum number of lines to get the job done. Just reviewing such a surgical modification was far easier than doing it myself.

I gave exactly the same prompt to Gemini. The result was a wholesale rearrangement of the code. Maybe it was good, but the effort to verify that was far lager than just doing it myself. It was a very 2023 experience.

The usual 2023 experience for me was ask an AI write some greenfield code, and get a result that looked like someone had changed variable names in something they found on the web after a brief search for code that looked like it might do a similar job. If you got lucky, it might have found something that was indeed very similar, but in my case that was rare. Asking it to modify code unlike something it had seen before was like asking someone to poke your eyes with a stick.

As I said, some of the organisers of this style of party seem have gotten their act together, so now it is well worth joining their parties. But this is a newish development.

dboreham · 2026-02-16T15:38:09 1771256289

If you hired a person six months ago and in that time they'd produced a ton of useful code for your product, wouldn't you say with authoritative framing that their hiring was a good decision?

groundzeros2015 · 2026-02-16T15:51:20 1771257080

It would, but I haven’t seen that. What I’ve seen is a lot of people setting up cool agent workflows which feel very productive, but aren’t producing coherent work.

This may be a result of me using tools poorly, or more likely evaluating merits which matter less than I think. But I don’t think we can see that yet as people just invented these agent workflows and we haven’t seen it yet.

Note that the situation was not that different before LLMs. I’ve seen PMs with all the tickets setup, engineers making PRs with reviews, etc and not making progress on the product. The process can be emulated without substantive work.

democracy · 2026-02-16T19:52:41 1771271561

Why is it always "tons of code"? Unless you are paid by the line of code writin "tons of code" makes no sense.

hickelpickle · 2026-02-16T17:59:08 1771264748

If there is one thing I have seen is that there is a subset of intellectual people will still be adverse to learning new tools, hang to ideological beliefs (I feel this though, watching programming as you know it die in a way, kinda makes you not want to follow it) and would prefer to just be lazy and not properly dogfood and learn their new tooling.

I'm seeing amazing result to with agents, when provided an well formed knowledge base and directed through each piece of work like its a sprint. Review and iron out scope requirements, api surface/contract, have agents create multi phase implementation plans and technical specifications in a share dev directory and to make high quality changes logs, document future consideration and any bugs/issues found that can be deferred. Every phase is addressed with a human code review along with gemini who is great at catching drift from spec and bugs in less obvious places.

While I'm sure an enterprise code base could still be an issue and would require even more direction (and opus I wont let touch java, it codes like an enterprise java greybeard who loves to create an interface/factory for everything), I think that's still just a tooling issues.

I'm not of the super pro AI camp, but having followed its development and used it throughout. For the first time I am actual amazed and bothered, and convinced if people dont embrace these tools, they will be left behind. No they dont 10-100x a jr dev, but if someone has proper domain knowledge to direct the agent, performs dual research with it to iron things out with the human actually understanding the problem space, 2-5x seems quite reasonable currently if driven by a capable developer. But this just move the work to review and documentation maintenance/crafting. Which has its own fatigue and is less rewarding for a programmers mind who loves to solve challenges and gets dopamine from it .

But given how man people are adverse...I dont think anyone who embraces it is going to have job security issues and be replaced, but here are many capable engineers who might due to their own reservations. I'm amazed by how many intelligent and capable people try llms/agents like a political straw man, there is no reasoning with them. They say vibe coding sucks (it does for anything more than a small throw away that wont be maintained), yet their examples for agents/llm not working is it can't just take a prompt and produce the best code ever and automatically and manifest the knowledge needed to work on their codebase. You still need to put in effort and learn to actually perform the engineering with the tools, but if it doesnt take a paragraph with no AGENTS.md and turn it into a feature or bug fix they are not good to them. Yeah they will get distracted and fuck up, just like if you throw 9/10 developers in the same situation and told them to get to work with no knowledge of the code base or domain and have their pr in by noon.

rco8786 · 2026-02-16T13:38:06 1771249086

That is also my experience. Doesn't even have to be a 10 year old codebase. Even a 1 year old codebase. Any one that is a serious product that is deployed in production with customers who rely on it.

Not to say that there's no value in AI written code in these codebases, because there is plenty. But this whole thing where 6 agents run overnight and "tada" in the morning with production ready code is...not real.

zerkten · 2026-02-16T14:17:52 1771251472

I don't believe that devs are the audience. They are pushing this to decision makers where they want them to think that the state of the art is further ahead than it is. These folks then think about how helpful it'd be to have 20% of that capability. When there is so much noise in the market, and everyone seems to be overtaking everyone else it, this kind of approach is the only one that gets attention.

Similarly, a lot of the AGI-hype comments exist to expand the scope of the space. It's not real, but it helps to position products and win arguments based on hypotheticals.

pjc50 · 2026-02-16T14:48:22 1771253302

Also anything that doesn't look like a SaaS app does very badly. We had an internal trial at embedded firmware and concluded the results were unsalvageably bad. It doesn't help that the embedded environment is very unfriendly to standard testing techniques, as well.

joquarky · 2026-02-16T23:54:02 1771286042

You will need to build an accessible knowledge base for the topics for which the models have not had extensive training.

Proprietary embedded system documentation is not exactly ubiquitous. You must provide reference material and guardrails where the training is weakest.

This applies to everything in ML: it will be weakest at the edges.

JeremyNT · 2026-02-16T15:24:30 1771255470

I feel like you could have correctly stated this a few months ago, but the way this is "solved" is by multiple agents that babysit each other and review their output - it's unreasonably effective.

You can get extremely good results assuming your spec is actually correct (and you're willing to chew through massive quantities of tokens / wait long enough).

nicoburns · 2026-02-16T18:04:32 1771265072

> You can get extremely good results assuming your spec is actually correct

Is it ever the case that the spec is entirely correct (and without underspecified parts)? I thought the reason we write code is because it's much easier to express a spec as code than it is to get a similar level of precision in prose.

JeremyNT · 2026-02-16T18:22:07 1771266127

I think this is basically the only SWE-type job that exists beyond the (relatively near) future: finding the right spec and feeding it to the bots. And in this way I think even complete laypeople will be able to create software using the bots, but you'd still want somebody with a deeper understanding in this role for serious projects.

The bots even now can really help you identify technical problems / mistakes / gaps / bad assumptions, but there's no replacing "I know what the business wants/needs, and I know what makes my product manager happy, and I know what 'feels' good" type stuff.

otabdeveloper4 · 2026-02-16T19:59:09 1771271949

> finding the right spec and feeding it to the bots

Also known as "compiling source code".

ldng · 2026-02-16T15:36:20 1771256180

And unreasonably expensive unless you are Big Corp. Die startups, die. Welcome to our Cyberpunk overlords.

whateveracct · 2026-02-16T17:17:50 1771262270

Companies will just shift money from salaries to their Anthropic bill - what's the problem?

oblio · 2026-02-17T09:07:28 1771319248

Best not build any products that compete with Anthropic!

JeremyNT · 2026-02-16T18:24:33 1771266273

Or hey, the VCs can self-deal by funding new startups that buy bot time from AI firms the same VCs already fund.

No pesky developers siphoning away equity!

flemhans · 2026-02-16T22:44:42 1771281882

My Claude Code has been running weeks on end churning through a huge task list almost unattended on a complex 15 yr old code base, auto-committing thousands of features. It is high quality code that will go live very soon.

rco8786 · 2026-02-17T15:50:48 1771343448

> that will go live very soon.

I hope that you are successful!

oblio · 2026-02-16T23:36:35 1771284995

Awesome! Which application or service?

lanstin · 2026-02-16T19:32:47 1771270367

The gas town discord has two people that are doing transformation of extremely legacy in house Java frameworks. Not reporting great success yet but also probably work that just wouldn’t be done otherwise.

democracy · 2026-02-16T19:44:13 1771271053

Oh that means you don't know how to use AI properly. Also its only 2026 imagine what AI agents can do in a few years /s

pzs · 2026-02-16T14:31:52 1771252312

Related question: how do we resolve the problem that we sign a blank cheque for the autonomous agents to use however many tokens they deem necessary to respond to your request? The analogy from team management: you don't just ask someone in your team to look into something only to realize three weeks later (in the absence of any updates) that they got nowhere with a problem that you expected to take less than a day to solve.

EDIT: fixed typo

rco8786 · 2026-02-16T14:36:02 1771252562

We'll have to solve for that sometime soon-ish I think. Claude Code has at least some sort of token estimation built-in to it now. I asked it to kick off a large agent team (~100 agents) to rewrite a bunch of SQL queries, one per agent. It did the first 10 or so, then reported back that it would cost too much to do it this way...so it "took the reins" without my permission and tried to convert each query using only the main agent and abandoned the teams. The results were bad.

But in any case, we're definitely coming up on the need for that.

pjc50 · 2026-02-16T14:51:26 1771253486

> blank cheque

The Bing AI summary tells me that AI companies invested $202.3 billion in AI last year. Users are going to have to pay that back at some point. This is going to be even worse as a cost control situation than AWS.

lossyalgo · 2026-02-17T01:03:34 1771290214

Didn't you hear? Ads are coming! (well not to Claude, because I guess they plan to somehow get unlimited SV funding?!)

thephyber · 2026-02-16T15:14:28 1771254868

> Users are going to have to pay that back at some point.

That’s not how VC investments work. Just because something costs a lot to build doesn’t mean that anyone will pay for it. I’m pretty sure I haven’t worked for any startup that ever returned a profit to its investors.

I suspect you are right in that inference costs currently seem underpriced so users will get nickel-and-dinked of a while until the providers leverage a better margin per user.

Some of the players are aiming for AGI. If they hit that goal, the cost is easily worth it. The remaining players are trying to capture market share and build a moat where none currently exists.

oblio · 2026-02-17T09:38:37 1771321117

I'm so glad Airbnb, Uber, Netflix, etc aren't both hiking their prices and enshittifying via ads, dark patterns, etc.

LLMs are not AGI and everyone is starting to see it. We need new basic research for that. Think fusion reactors.

tsunamifury · 2026-02-16T15:54:11 1771257251

What planet are you living on and how do I get there.

Yes currency is very rarely at times exchanged at a loss for power but rarely not for more currency down the road.

Kye · 2026-02-16T14:34:39 1771252479

An AI product manager agent trained on all the experience of product managers setting budgets for features and holding teams to it. Am I joking? I do not know.

peab · 2026-02-16T15:17:50 1771255070

This seems pretty in line with how you’d manage a human - you give it a time constraint. a human isn't guaranteed to fix a problem either, and humans are paid by time

the_harpia_io · 2026-02-16T15:02:55 1771254175

yeah I think that's exactly the disconnect - they're optimizing for a future where agents can actually be trusted to run autonomously, but we're not there yet. like the reliability just isn't good enough to justify hiding what it's doing. and honestly I'm not sure we'll get there by making the UX worse for humans who are actively supervising, because that's how you catch the edge cases that training data misses. idk, feels like they're solving tomorrow's problem while making today's harder

buschleague · 2026-02-16T20:07:00 1771272420

We run agent teams (Navigator/Driver/Reviewer roles) on a 71K-line codebase. The trust problem is solved by not trusting the agents at all. You enforce externally. Python gates that block task completion until tests pass, acceptance criteria are verified, and architecture limits are met. The agents can't bypass enforcement mechanisms they can't touch. It's not about better prompts or more capable models. It's about infrastructure that makes "going off the rails" structurally impossible.

oblio · 2026-02-17T09:39:55 1771321195

Which service or application?

twalichiewicz · 2026-02-16T20:13:27 1771272807

I think this is exactly the crux: there are two different UX targets that get conflated.

In operator/supervisor mode (interactive CLI), you need high-signal observability while it’s running so you can abort or re-scope when it’s reading the wrong area or compounding assumptions. In batch/autonomous mode (headless / “run overnight”), you don’t need a live scrollback feed, but you still need a complete trace for audit/debug after the fact.

Collapsing file paths into counters is a batch optimization leaking into operator mode. The fix isn’t “verbose vs not” so much as separating channels: keep a small status line/spine (phase, current target, last tool call), keep an event-level trace (file paths / commands / searches) that’s persisted and greppable, and keep a truly-verbose mode for people who want every hook/subagent detail.

simianwords · 2026-02-16T14:35:51 1771252551

>The latest "meta" in AI programming appears to be agent teams (or swarms or clusters or whatever) that are designed to run for long periods of time autonomously.

more reason to catch them otherwise we have to wait a longer time. in fact hiding is more correct if the AI was less autonomous right?

logicchains · 2026-02-16T16:14:25 1771258465

>Through that lens, these changes make more sense. They're not designing UX for a human sitting there watching the agent work. They're designing for horizontally scaling agents that work in uninterrupted stretches where the only thing that matters is the final output, not the steps it took to get there.

Even in that case they should still be logging what they're doing for later investigation/auditing if something goes wrong. Regardless of whether a human or an AI ends up doing the auditing.

faeyanpiraat · 2026-02-16T13:28:41 1771248521

Looking at it from far is simply making something large from a smaller input, so its kind of like nondeterministic decompression.

What fills the holes are best practices, what can ruin the result is wrong assumptions.

I dont see how full autonomy can work either without checkpoints along the way.

rco8786 · 2026-02-16T13:40:25 1771249225

Totally agreed. Those assumptions often compound as well. So the AI makes one wrong decision early in the process and it affects N downstream assumptions. When they finally finish their process they've built the wrong thing. This happens with one process running. Even on latest Opus models I have to babysit and correct and redirect claude code constantly. There's zero chance that 5 claude codes running for hours without my input are going to build the thing I actually need.

And at the end of the day it's not the agents who are accountable for the code running in the production. It's the human engineers.

adastra22 · 2026-02-16T14:20:42 1771251642

Actually it works the other way. With multiple agents they can often correct each others mistaken assumptions. Part of the value of this approach is precisely that you do get better results with fewer hallucinated assumptions.

Still makes this change from Anthropic stupid.

rco8786 · 2026-02-16T14:30:57 1771252257

The corrective agent has the exact same percentage chance at making the mistake. "Correcting" an assumption that was previously correct into an incorrect one.

If a singular agent has a 1% chance of making an incorrect assumption, then 10 agents have that same 1% chance in aggregate.

adastra22 · 2026-02-16T15:38:41 1771256321

You are assuming statistical independence, which is explicitly not correct here. There is also an error in your analysis - what matters is whether they make the same wrong assumption. That is far less likely, and becomes exponentially unlikely with increasing trials.

I can attest that it works well in practice, and my organization is already deploying this technique internally.

thesz · 2026-02-16T16:37:29 1771259849

How several wrong assumptions make it right with increasing trials?

adastra22 · 2026-02-16T16:49:47 1771260587

You can ask Opus 4.6 to do a task and leave it running for 30min or more to attempt one-shooting it. Imagine doing this with three agents in parallel in three separate work trees. Then spin up a new agent to decide which approach of the three is best on the merits. Repeat this analysis in fresh contexts and sample until there is clear consensus on one. If no consensus after N runs, reframe to provide directions for a 4th attempt. Continue until a clear winning approach is found.

This is one example of an orchestration workflow. There are others.

thesz · 2026-02-16T17:32:10 1771263130

  > Then spin up a new agent to decide which approach of the three is best on the merits. Repeat this analysis in fresh contexts and sample until there is clear consensus on one.

If there are several agents doing analysis of solutions, how do you define a consensus? Should it be unanimous or above some threshold? Are agents scores soft or hard? How threshold is defined if scores are soft? There is a whole lot of science in voting approaches, which voting approach is best here?

Is it possible for analyzing agents to choose the best of wrong solutions? E.g., longest remembered table of FizzBuzz answers amongst remembered tables of FizzBuzz answers.

adastra22 · 2026-02-16T20:31:29 1771273889

We have a voting algorithm that we use, but we're not at the level of confidential disclosure if we proceed further in this discussion. There's lots of research out there into unbiased voting algorithms for consensus systems.

thesz · 2026-02-16T22:19:34 1771280374

You conveniently decided not to answer my question about quality of the solutions to vote on (ranking FizzBuzz memorization).

To me, our discussion shows that what you presented as a simple thing is not simple at all, even voting is complex, and actually getting a good result is so hard it warrants omitting answer altogether.

adastra22 · 2026-02-17T04:44:28 1771303468

Yeah, you've got unrealistic expectations if you expect me to divulge my company's confidential IP in a HN comment.

thesz · 2026-02-17T07:16:40 1771312600

I had no expectations at all, I just asked questions, expecting answers. At the very beginning the tone of your comment, as I read it, was "agentic coding is nothing but simple, look they vote." Now answers to simple but important questions are "confidential IP."

Okay then, agentic coding is nothing but complex task requiring knowledge of unbiased voting (what is this thing really?) and, apparently, use of necessarily heavy test suite and/or theorem provers.

democracy · 2026-02-16T19:59:05 1771271945

It was a scene from a sci-fi movie (i mean Claude demo to CTOs)

groundzeros2015 · 2026-02-16T14:37:05 1771252625

Nonsense. If you have 16 binary decisions that’s 64k possible paths.

adastra22 · 2026-02-16T15:35:40 1771256140

These are not independent samplings.

groundzeros2015 · 2026-02-16T15:47:00 1771256820

Indeed. Doesn’t that make it worse? Prior decisions will bring up path dependent options ensuring they aren’t even close to the same path.

adastra22 · 2026-02-16T16:41:53 1771260113

Run a code review agent, and ask it to identify issues. For each issue, run multiple independent agents to perform independent verification of this issue. There will always be some that concur and some that disagree. But the probability distributions are vastly different for real issues vs hallucinations. If it is a real issue they are more likely to happen upon it. If it is a hallucination, they are more likely to discover the inconsistency on fresh examination.

This is NOT the same as asking “are you sure?” The sycophantic nature of LLMs would make them biased on that. But fresh agents with unbiased, detached framing in the prompt will show behavior that is probabilistically consistent with the underlying truth. Consistent enough for teasing out signal from noise with agent orchestration.

peyton · 2026-02-16T14:22:52 1771251772

Take a look at the latest Codex on very-high. Claude’s astroturfed IMHO.

rco8786 · 2026-02-16T14:32:58 1771252378

Can you explain more? I'm talking about LLM/agent behavior in a generalized sense, even though I used claude code as the example here.

What is Codex doing differently to solve for this problem?

KurSix · 2026-02-16T16:04:54 1771257894

If they're aiming for autonomy, why have a CLI at all? Just give us a headless mode. If I'm sitting in the terminal, it means I want to control the process. Hiding logs from an operator who’s explicitly chosen to run it manually just feels weird

fdefitte · 2026-02-16T20:11:25 1771272685

Agent teams working autonomously sounds cool until you actually try it. We've been running multi-agent setups and honestly the failure modes are hilarious. They don't crash, they just quietly do the wrong thing and act super confident about it.

oblio · 2026-02-17T09:41:07 1771321267

AI offshore teams in "yes cultures".