Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
"Token anxiety", a slot machine by any other name (jkap.io)
175 points by presbyterian 19 hours ago | hide | past | favorite | 147 comments
 help



I know I'm running a bit late to the party here, but maybe someone can provide some color that I (on the slightly older end of the spectrum when it comes to this) don't fully understand.

When people talk about leaving their agents to run overnight, what are those agents actually doing? The limited utility I've had using agent-supported software development requires a significant amount of hand holding, maybe because I'm in an industry with limited externally available examples to build am model off of (though all of the specifications are public, I've yet to see an agent build an appropriate implementation).

So it's much more transactional...I ask, it does something (usually within seconds), I correct, it iterates again...

What sort of tasks are people putting these agents to? How are people running 'multiple' of these agents? What am I missing here?


This is my experience of it too. Perhaps if it was chunking through a large task like upgrading all of our repos to the latest engine supported by our cloud provider, I could leave it overnight. Even then it would just result in a large daylight backlog of "not quite right" to review and redo.

I wrote a program to classify thousands of images but that was using a model running on my gaming PC. Took about 3 days to classify them all. Only cost me the power right?

The gambling analogy completely falls apart on inspection. Slot machines have variable reward schedules by design — every element is optimized to maximize time on device. Social media optimizes for engagement, and compulsive behavior is the predictable output. The optimization target produces the addiction.

What's Anthropic's optimization target??? Getting you the right answer as fast as possible! The variability in agent output is working against that goal, not serving it. If they could make it right 100% of the time, they would — and the "slot machine" nonsense disappears entirely. On capped plans, both you and Anthropic are incentivized to minimize interactions, not maximize them. That's the opposite of a casino. It's ... alignment (of a sort)

An unreliable tool that the manufacturer is actively trying to make more reliable is not a slot machine. It's a tool that isn't finished yet.

I've been building a space simulator for longer than some of the people diagnosing me have been programming. I built things obsessively before LLMs. I'll build things obsessively after.

The pathologizing of "person who likes making things chooses making things over Netflix" requires you to treat passive consumption as the healthy baseline, which is obviously a claim nobody in this conversation is bothering to defend.


> What's Anthropic's optimization target??? Getting you the right answer as fast as possible!

What makes you believe this? The current trend in all major providers seem to be: get you to spin up as many agents as possible so that you can get billed more and their number of requests goes up.

> Slot machines have variable reward schedules by design

LLMs by all major providers are optimized used RLHF where they are optimized in ways we don't entirely understand to keep you engaged.

These are incredibly naive assumptions. Anthropic/OpenAI/etc don't care if you get your "answer solved quickly", they care that you keep paying and that all their numbers go up. They aren't doing this as a favor to you and there's no reason to believe that these systems are optimized in your interest.

> I built things obsessively before LLMs. I'll build things obsessively after.

The core argument of the "gambling hypothesis" is that many of these people aren't really building things. To be clear, I certainly don't know if this is true of you in particular, it probably isn't. But just because this doesn't apply to you specifically doesn't mean it's not a solid argument.


> The current trend in all major providers seem to be: get you to spin up as many agents as possible so that you can get billed more and their number of requests goes up.

Well stated


> What makes you believe this?

Simply, cut-throat competition. Given multiple nations are funding different AI-labs, quality of output and speed are one of the most important things.


Dating apps also have cut-throat competition and none of them are optimised for minimising the time you spend on the app.


sigh We're doing this lie again? Quality of Outcome is not, has never been, and if the last 40 years are anything to go on will never be a core or even tangential goal. Dudes are trying to make the stock numbers go up and get paid. That's it. That's all it ever is.

Hey man people either get it or they don't. We're doomed.

How is nation-states funding private corporations "cut-throat competition"?

Ok, to be very honest I wrote that in the middle of having a couple of drinks. I guess, what I mean is, countries are funding AI labs because it can turn into a “winner-takes-it-all” competition. Unless the country starts blocking the leading providers.

Private companies will turn towards the best, fastest, cheapest (or some average of them). Country borders don’t really matter. All labs are fighting to get the best thing out in the public for that reason, because winning comes with money, status, prestige, and actually changing the world. This kind of incentives are rare.


What does this even mean? Are you disputing the fact that AI labs are competing with each other because they are funded by nation-states?

Why do you have to compete if you can just say "but China!" And get billions more dollars from the government

He's disputing the idea that nationally funded business initiatives are competitive.

Cut throat competition between nations is usually called war. In war, gathering as much information as possible on everyone is certainly a strategic wanna do. Selling psyops about how much benefits will come for everyone willing to join the one sided industrial dependency also is a thing. Giving significant boost to potentially adversarial actors is not a thing.

That said universe don't obligate us to think the cosmos is all about competition. Cooperation is always possible as a viable path, often with far more long term benefits at scale.

Competition is superfluous self inflict masochism.


> The current trend in all major providers seem to be: get you to spin up as many agents as possible so that you can get billed more and their number of requests goes up.

I was surprised when I saw that Cursor added a feature to set the number of agents for a given prompt. I figured it might be a performance thing - fan out complex tasks across multiple agents that can work on the problem in parallel and get a combined solution. I was extremely disappointed when I realized it's just "repeat the same prompt to N separate agents, let each one take a shot and then pick a winner". Especially when some tasks can run for several minutes, rapidly burning through millions of tokens per agent.

At that point it's just rolling dice. If an agent goes so far off-script that its result is trash, I would expect that to mean I need to rework the instructions and context I gave it, not that I should try the same thing again and hope that entropy fixes it. But editing your prompt offline doesn't burn tokens, so it's not what makes them money.


Cursor and others have a subagent feature, which sounds like what you wanted. However, there has to be some decision making around how to divide up a prompt into tasks. This is decided by the (parent) model currently.

The best-of-N feature is a bit like rolling N dice instead of one. But it can be quite useful if you use different models with different strengths and weaknesses (e.g. Claude/GPT-5/Gemini), rather than assigning all to N instances of Claude, for example. I like to use this feature in ask mode when diving into a codebase, to get an explanation a few different ways.


Bill is unrelated to their cost. If they can produce answer in 1/10th of the token, they can charge 10x more per token, likely even more.

That is simply not true, token price is largely determined by the token price of their rival services (even before their own operational costs). If everybody else charges about $1 per millions of tokens, then they will also charge about $1 per millions of tokens (or slightly above/below) regardless of how many answers per token they can provide.

This applies when there is a large number of competitors.

Now companies are fighting for the attention of a finite number of customers, so they keep their prices in line with those around them.

I remember when Google started with PPC - because few companies were using it, it cost a fraction of recent prices.

And the other issue to solve is future lack of electricity for land data centers. If everyone wants to use LLM… but data centers capacity is finite due to available power -> token prices can go up. But IMHO devs will find an innovative approach for tokens, less energy demanding… so token prices will probably stay low.


Opus 4.6 costs about 5-10x of GLM 5.

It only matters if the rivals have same performance. Opus pricing is 50x Deepseek, and like >100x of small models. It should match rival if the performance is same, and if they can produce model with 10x lower token usage, they can charge 10x.

Gemini increased the same Flash's price by something like 5x IIRC when it got better.


I bet that the actual "performance" of all the top-tier providers is so similar, that branding has bigger impact on if you think Claude or ChatGPT peforms better.

What businesses charge for a product is completely unrelated to what it costs them.

They charge what the market will bear.

If "what the market will bear" is lower than the cost of production then they will stop offering it.


Companies make a loss on purpose all the time.

Not forever. If that's their main business then they will eventually have to profit or they die.

> The gambling analogy completely falls apart on inspection. Slot machines have variable reward schedules by design — every element is optimized to maximize time on device. Social media optimizes for engagement, and compulsive behavior is the predictable output. The optimization target produces the addiction.

Intermittent variable rewards, whether produced by design or merely as a byproduct, will induce compulsive behavior, no matter the optimization target. This applies to Claude


Sometimes I will go out and I will plant a pepper plant and take care of it all summer long and obsessively ensure it has precisely the right amount of water and compost and so on... and ... for some reason (maybe I was on vacation and it got over 105 degrees?) I don't get a good crop.

Does this mean I should not garden because it's a variable reward? Of course not.

Sometimes I will go out fishing and I won't catch a damn thing. Should I stop fishing?

Obviously no.

So what's the difference? What is the precise mechanism here that you're pointing at? Because sometimes life is disappointing is a reason to do nothing. And yet.


It's a not a binary thing, it's a spectrum. There are many elements of uncertainty in every action imaginable. I'm inclined to agree with the other commenter though, the LLM slot machine is absolutely closer on that spectrum to gambling than your example is.

Anthropic's optimization target is getting you to spend tokens, not produce the right answer. It's to produce an answer plausible enough but incomplete enough that you'll continue to spend as many tokens as possible for as long as possible. That's about as close to a slot machine as I can imagine. Slot rewards are designed to keep you interested as long as possible, on the premise that you _might_ get what you want, the jackpot, if you play long enough.

Anthropic's game isn't limited to a single spin either. The small wins (small prompts with well defined answers) are support for the big losses (trying to one shot a whole production grade program).


> Anthropic's optimization target is getting you to spend tokens, not produce the right answer.

The majority of us are using their subscription plans with flat rate fees.

Their incentive is the precise opposite of what you say. The less we use the product, the more they benefit. It's like a gym membership.

I think all of the gambling addiction analogies in this thread are just so strained that I can't take them seriously. Even the basic facts aren't even consistent with the real situation.


Thats a bit naive. Anthropic makes way more money if they gey you to use past your plans limit and wonder if you should get the next tier or switch to tokens

The price jump between subscription tiers is so high that relatively few people will upgrade instead of waiting a few more hours, and even if somebody does upgrade to the next subscription level, Anthropic still has an incentive to provide satisfactory answers as quickly as possible, to minimize tokens used per subscription, and because there is plenty of competition so any frustrated users are potential lost customers.

I swear this whole conversation is motivated reasoning from AI holdouts who desperately want to believe everybody else is getting scammed by a gambling scheme, that they don't stop and think about the situation rationally. Insofar as Claude is dominant, it's only because Claude works the best. There is meaningful competition in this market, as soon as Anthropic drops the ball they'll be replaced.


And we're still in the expansion phase, so LLM life is actually good... for now.

It's not going to get worse than now though. Open models like GLM 5 are very good. Even if companies decide to crank up the costs, the current open models will still be available. They will likely get cheaper to run over time as well (better hardware).

That's good to hear. I'm not really up-to-date on the open models, but they will become essential, I'm sure.

im on a subscription though.

they want me to not spend tokens. that way my subscription makes money for them rather than costing them electricity and degrading their GPUs


Wouldn't that apply only to a truly unlimited subscription? Last I looked all of their subs have a usage limit.

If you're on anything but their highest tier, it's not altogether unreasonable for them to optimize for the greatest number of plan upgrades (people who decide they need more tokens) while minimizing cancellations (people frustrated by the number of tokens they need). On the highest tier, this sort of falls apart but it's a problem easily solved by just adding more tiers :)

Of course, I don't think this is actually what's going on, but it's not irrational.


For subscription isers, anthropic makes mkre money if you hit your usage limit and wonder idlf the next plan, or switching to tokens would be better. Especially given the FOMO you probably have from all these posts talking about peoples productivity

> im on a subscription though.

Understood.

> they want me to not spend tokens.

No, they want you to expand your subscription. Maybe buy 2x subscriptions.


He's not going to do that if all Claude can do is waste tokens for hours.

> you'll continue to spend as many tokens as possible for as long as possible.

I mean this only works if Anthropic is the only game in town. In your analogy if anyone else builds a casino with a higher payout then they lose the game. With the rate of LLM improvement over the years, this doesn't seem like a stable means of business.


While I don't know if this applies to AI usage, but actual gambling addicts most certainly do not shop around for the best possible rewards: they stick more or less to the place they got addicted at initially. Not to mention, there's plenty of people addicted to "casinos" that give 0 monetary rewards, such as Candy Crush or Farmville back in the day and Genshin Impact or other gacha games today.

So, if there's a way to get people addicted to AI conversations, that's an excellent way to make money even if you are way behind your competitors, as addicted buyers are much more loyal that other clients.


You're taking the gambling analogy too seriously. People do in fact compare different LLMs and shop around. How gamblers choose casinos is literally irrelevant because this whole analogy is nothing more than a retarded excuse for AI holdouts to feel smug.

The timescale is one difference, it's hard to get "sucked in" in the gambling-like mindless state when the timescales are over seasons as opposed to minutes. There's a reason gambling isn't done in a correspondence format.

In human physiology/psychology as well, the chance of addiction itself is a function of timescale. This is why a nicotine patch is much less addictive than insufflated nicotine (hours to reach peak effect vs seconds), or why addictive software have plenty of sensory experiences attached to every action, to keep the user engaged.

Are you a pepper farmer taking this approach to feed your family,

or a hobbyist gardener?


??? I'm pretty sure you know what the differences are. Go touch grass and tell me it's the same as looking at a plant on a screen.

Dealing with organic and natural systems will, most of the time, have a variable reward. The real issue comes from systems and services designed to only be accessible through intermittent variable rewards.

Oh, and don't confuse Claude's artifacts working most of the time with them actually optimizing to be that way. They're optimizing to ensure token usage. I.E. LLMs have been fine-tuned to default to verbose responses. They are impressive to less experienced developers, often easier to detect certain types of errors (eg. Improper typing), and will make you use more tokens.


So gambling is fine as long as I'm doing it outside. Poker in a casino? Bad. Poker in a foresty meadow, good. Got it.

Basically true tbqh. Poker is maybe the one exception, but you're almost always better off gambling "in the wild" e.g. poker night with your buds instead of playing slots or anything else where "the house" is always winning in the long run. Are your losses still circulating in your local community, or have they been siphoned off by shareholders on the other side of the world? Gambling with friends is just swapping money back and forth, but going to a casino might as well be lighting the money on fire.

> Intermittent variable rewards, whether produced by design or merely as a byproduct, will induce compulsive behavior, no matter the optimization target.

This is an incorrect understanding of intermittent variable reward research.

Claims that it "will induce compulsive behavior" are not consistent with the research. Most rewards in life are variable and intermittent and people aren't out there developing compulsive behavior for everything that fits that description.

There are many counter-examples, such as job searching: It's clearly an intermittent variable reward to apply for a job and get a good offer for it, but it doesn't turn people into compulsive job-applying robots.

The strongest addictions to drugs also have little to do with being intermittent or variable. Someone can take a precisely measured abuse-threshold dose of a drug on a strict schedule and still develop compulsions to take more. Compulsions at a level that eclipse any behavior they'd encounter naturally.

Intermittent variable reward schedules can be a factor in increasing anticipatory behavior and rewards, but claiming that they "will induce compulsive behavior" is a severe misunderstanding of the science.


And that's only bad if it's illusory or fake. This reaction evolved because it's adaptive. In slot machines the brain is tricked to believe there is some strategy or method to crack and the reward signals make the addict feel there is some kind of progress being made in return to some kind of effort.

The variability in eg soccer kicks or basketball throws is also there but clearly there is a skill element and a potential for progress. Same with many other activities. Coding with LLMs is not so different. There are clearly ways you can do it better and it's not pure randomness.


>Intermittent variable rewards,

So you're saying businesses shouldn't hire people either?


>The pathologizing of "person who likes making things chooses making things over Netflix" requires you to treat passive consumption as the healthy baseline, which is obviously a claim nobody in this conversation is bothering to defend

I think their greater argument was to highlight how agentic coding is eroding work life balance, and that companies are beginning to make that the norm.


> The gambling analogy completely falls apart on inspection.

yeah I think the bluesky embed is much more along the lines of what I'm experiencing than the OP itself.


Right. A platform who makes money the more you have to use it is definitely optimizing to get you the right answer in as few tokens as possible.

There is absolutely no incentive to do that, for any of these companies. The incentive is to make the model just bad enough you keep coming back, but not so bad you go to a competitor.

We've already seen this play out. We know Google made their search results worse to drive up and revenue. Exact same incentives are at play here, only worse.


Please go read how the Anthropic max plan works.

IF I USE LESS TOKENS, ANTHROPIC GETS MORE MONEY! You are blindly pattern matching to "corporation bad!" without actually considering the underlying structure of the situation. I believe there's a phrase for this to do with probabilistic avians?


As an investor in Anthropic which pricing strategy would you support? That's the question you need to ask, not what there current pricing strategy in the win the market phase happens to be.

It’s sort of surprising how naive developers still are given the countless rug pulls over the past decade or two.

You’re right on the money: the important thing to look at are the incentive structures.

Basically all tech companies from the post-great financial crisis expansion (Google, post Balmer Microsoft, Twitter, Instagram, Airbnb, Uber, etc) started off user-friendly but all eventually converged towards their investment incentive structure.

One big exception is Wikipedia. Not surprising since it has a completely different funding model!

I’m sure Anthropic is super user friendly now, while they are focused on expansion and founding devs still have concentrated policial sway. It will eventually converge on its incentive structures to extract profit for shareholders like all other companies.


The Max plan has usage limits, and you can buy more... Which is exactly what I'm talking about...

And the incentive is even strong for the lower tiers. They want answers to be just good enough to keep you using it, but bad enough that you're pushed towards buying the higher tier.


Have you actually used a max plan? You have to try really damn hard to get close to the max plan usage. I don't think that's something that realistically happens by accident, you have to be deliberately spawning a huge number of subagents or something.

What if I use zero tokens, as I'm currently doing? Do they get any money then?

Anthropic themselves have described CC as a slot machine:

https://www-cdn.anthropic.com/58284b19e702b49db9302d5b6f135a...

(cmd-f "slot machine")


> What's Anthropic's optimization target??? Getting you the right answer as fast as possible!

Are you totally sure they are not measuring/optimizing engagement metrics? Because at least I can bet OpenAI is doing that with every product they have to offer.


> What's Anthropic's optimization target??? Getting you the right answer as fast as possible!

That is a generous interpretation. Mighr be correct. But they dont make as much money if you quickly get the right answer. They make more money if you spend as many tokens as possible being on that "maybe next time" hook.

Im not saying theyre actually optimizng for that. But charlie munger said "show me the incentives, and ill show you the outcome"


I know for sure that each and every AI I use wants to write whole novellas in response to every prompt unless I carefully remind it to keep responses short over and over and over again.

This didn't used to be the case, so I assume that it must be intentional.


I've noticed this getting a lot worse recently. I just want to ask a simple question, and end up gettig a whole essay in response, an 8-step plan, and 5 follow-up questions. Lately ChatGPT has also been referencing previous conversations constantly, as if to prove that it "knows" me.

"Should I add oregano to brown beans or would that not taste good?"

"Great instinct! Based on your interests in building new apps and learning new languages, you are someone who enjoys discovering new things, and it makes sense that you'd want to experiment with new flavor profiles as well. Your combination of oregano and brown beans is a real fusion of Italian and Mexican food, skillfully synthesizing these two cultures.

Here's a list of 5 random unrelated spices you can also add to brown beans:

Also, if you want to, I can create a list of other recipes that incorporate these oregano. Just say the words "I am hungry" and I will get right back to it!"

Also, random side note, I hate ChatGPT asking me to "say the word" or "repeat the sentence". Just ask me if I want it and then I say yes or no, I am not going to repeat "go oregano!" like some sort of magic keyphrase to unlock a list of recipes.


> "person who likes making things chooses making things over Netflix"

This is subtly different. It's not clear that the people depicted like making things, in the sense of enjoying the process. The narrative is about LLMs fitting into the already-existing startup culture. There's already a blurry boundary between "risky investment" and "gambling", given that most businesses (of all types, not just startups) have a high failure rate. The socially destructive characteristic identified here is: given more opportunity to pull the handle on the gambling machine, people are choosing to do that at the expense of other parts of their life.

But yes, this relies on a subjective distinction between "building, but with unpredictable results" and "gambling, with its associated self-delusions".


> The gambling analogy completely falls apart on inspection.

The analogy was too strained to make sense.

Despite being framed as a helpful plea to gambling addicts, I think it’s clear this post was actually targeted at an anti-LLM audience. It’s supposed to make the reader feel good for choosing not to use them by portraying LLM users as poor gambling addicts.


> What's Anthropic's optimization target??? Getting you the right answer as fast as possible!

Wait, what? Anthropic makes money by getting you to buy and expend tokens. The last thing they want is for you to get the right answer as fast as possible. They want you to sometimes get the right answer unpredictably, but with enough likelihood that this time will work that you keep hitting Enter.


At one point, people said Google's optimization target was giving you the right search results as soon as possible. What will prevent Anthropic from falling into the same pattern of enshittification as its predecessors, optimizing for profit like all other businesses?

Slightly off topic actually but ill put it here.

I found it interesting that Google removed the "summary cards" supposedly "to improve user experience" however the AI overview was added back.

I suspect the AI overview is much more influenceable by advertisement money then the summary cards where.


I stopped using Google years ago because they stopped trying to provide good search results. If Anthropic stops trying to provide a good coding agent, I'll stop using them too.

Doesn't the alignment sort of depend on who is paying for all the tokens?

If Dave the developer is paying, Dave is incentivized to optimize token use along with Anthropic (for the different reasons mentioned).

If the Dave's employer, Earl, is paying and is mostly interested in getting Dave to work more, then what incentive does Dave have to minimize tokens? He's mostly incentivized by Earl to produce more code, and now also by Anthropic's accidentally variable-reward coding system, to code more... ?


The LLM is not the slot machine. The LLM is the lever of the slot machine, and the slot machine itself is capitalism. Pull the lever, see if it generates a marketable product or moment of virality, get rich if you hit the jackpot. If not, pull again.

This does seem like a person getting hooked on idle games, or mobile/online games with artificially limited progress (that you can pay to lift). It's a type of delayed gratification that makes you anxious to get next one.

Not everyone gets hooked on those, but I do. I've played a bunch of those long-winded idle games, and it looks like a slight addiction. I would get impatient that it takes so long to progress, and it would add anxiety to e.g. run this during breaks at work, or just before going to sleep. "Just one more click".

And to be perfectly honest, it seems like the artificial limits of Anthropic (5 hour session limits) dig into similar mechanism. I do less non-programming hobbies since I've got myself a subscription.


I’d rather grind Runescape at this point, than become addicted at trying to automate away my job.

I wish the author had stuck to the salient point about work/life balance instead of drifting into the gambling tangent, because the core message is actually more unsettling. With the tech job market being rough and AI tools making it so frictionless to produce real output, the line between work time and personal time is basically disappearing.

To the bluesky poster's point: Pulling out a laptop at a party feels awkward for most; pulling out your phone to respond to claude barely registers. That’s what makes it dangerous: It's so easy to feel some sense of progress now. Even when you’re tired and burned out, you can still make progress by just sending off a quick message. The quality will, of course, slip over time; but far less than it did previously.

Add in a weak labor market and people feel pressure to stay working all the time. Partly because everyone else is (and nobody wants to be at the bottom of the stack ranking), and partly because it’s easier than ever to avoid hitting a wall by just "one more message". Steve Yegge's point about AI vampires rings true to me: A lot of coworkers I’ve talked to feel burned out after just a few months of going hard with AI tools. Those same people are the ones working nights and weekends because "I can just have a back-and-forth with Claude while I'm watching a show now".

The likely result is the usual pattern for increases in labor productivity. People who can’t keep up get pushed out, people who can keep up stay stuck grinding, and companies get to claim the increase in productivity while reducing expenses. Steve's suggestion for shorter workdays sound nice in theory, but I would bet significant amounts of money the 40-hour work week remains the standard for a long time to come.


Another interesting thing here is that the gap between "burned out but just producing subpar work" and "so crispy I literally cannot work" is even wider with AI. The bar for just firing off prompts is low, but the mental effort required to know the right prompts to ask and then validate is much higher so you just skip that part. You can work for months doing terrible work and then eventually the entire codebase collapses.

> With the tech job market being rough and AI tools making it so frictionless to produce real output, the line between work time and personal time is basically disappearing.

This isn't generally true at all. The "all tech companies are going to 996" meme comes up a lot here but all of the links and anecdotes go back to the same few sources.

It is very true that the tech job market is competitive again after the post-COVID period where virtually nobody was getting fired and jobs were easy to find.

I do not think it's true that the median or even 90th percentile tech job is becoming so overbearing that personal time is disappearing. If you're at a job where they're trying to normalize overwork as something everyone is doing, they're just lying to you to extract more work.


It would never show up as some explicit rule or document. It just sort of happens when a few things line up: execs start off-handedly praising 996, stack ranking is still a thing, and the job market is bad enough that getting fired feels genuinely dangerous.

It starts with people who feel they’ve got more to lose (like those supporting a family) working extra to avoid looking like a low performer, whether that fear is reasonable or not. People aren’t perfectly rational, and job-loss anxiety makes them push harder than they otherwise would. Especially now, when "pushing harder" might just mean sending chat messages to claude during your personal time.

Totally anecdotal (strike 1), and I'm at a FAANG which is definitely not the median tech job (strike 2), but it’s become pretty normal for me to come back Monday to a pile of messages sent by peers over the weekend. A couple years ago even that was extremely unusual; even if people were working on the weekend they at least kept up a facade that they weren't.


I know it's popular comparing coding agents to slot machines right now, but the comparison doesn't entirely hold for me.

It's more like being hooked on a slot machine which pays out 95% of the time because you know how to trick it.

(I saw "no actual evidence pointing to these improvements" with a footnote and didn't even need to click that footnote to know it was the METR thing. I wish AI holdouts would find a few more studies.)

Steve Yegge of all people published something the other day that has similar conclusions to this piece - that the productivity boost for coding agents can lead to burnout, especially if companies use it to drive their employees to work in unsustainable ways: https://steve-yegge.medium.com/the-ai-vampire-eda6e4f07163


Yeah I'm finding that there's "clock time" (hours) and "calendar time" (days/weeks/months) and pushing people to work 'more' is based on the fallacy that our productivity is based on clock time (like it is in a factory pumping out widgets) rather than calendar time (like it is in art and other creative endeavors). I'm finding that even if the LLM can crank out my requested code in an hour, I'll still need a few days to process how it feels to use. The temptation is to pull the lever 10 times in a row because it was so easy, but now I'll need a few weeks to process the changes as a human. This is just for my own personal projects, and it makes sense that the business incentives would be even more intense. But you can't get around the fact that, no matter how brilliant your software or interface, customers are not going to start paying in a few hours.

> The temptation is to pull the lever 10 times in a row because it was so easy, but now I'll need a few weeks to process the changes as a human

Yeah I really feel that!

I recently learned the term "cognitive debt" for this from https://margaretstorey.com/blog/2026/02/09/cognitive-debt/ and I think it's a great way to capture this effect.

I can churn out features faster, but that means I don't get time to fully absorb each feature and think through its consequences and relationships to other existing or future features.


If you are really good and fast validating/fixing code output or you are actually not validating it more than just making sure it runs (no judging), I can see it paying out 95% of the time.

But for what I've seen both validating my and others coding agents outputs I'd estimate a much lower percentage (Data Engineering/Science work). And, oh boy, some colleages are hooked to generating no matter the quality. Workslop is a very real phenomenon.


This matches my experience using LLMs for science. Out of curiosity, I downloaded a randomized study and the CONSORT checklist, and asked Claude code to do a review using the checklist.

I was really impressed with how it parsed the structured checklist. I was not at all impressed by how it digested the paper. Lots of disguised errors.


try codex 5.3. it's dry and very obviously AI; if you allow a bit of anthropomorphisation, it's kind of high-functioning autistic. it isn't an oracle, it'll still be wrong, but it's a powerful, completely different from claude tool.

Does it get numbers right? One of the mistakes it made in reading the paper was swapping sets of numbers from the primary/secondary outcomes.

it does get screenshots right for me, but obviously I haven't tried on your specific paper. I can only recommend trying it out, it's also has a much more generous limits in the $20 tier than opus.

I see. To clarify, it parsed numbers in the pdf correct, but assigned them the wrong meaning. I was wondering if codex is better at interpreting non text data

Every time someone suggests Codex I give it a shot. And every time it disappoints.

After I read your comment, I gave Codex 5.3 the task of setting up an E2E testing skeleton for one of my repos, using Playwright. It worked for probably 45 minutes and in the end failed miserably: out of the five smoke tests it created, only two of them passed. It gave up on the other three and said they will need “further investigation”.

I then stashed all do that code and gave the exact same task to Opus 4.5 (not even 4.6), with the same prompt. After 15 mins it was done. Then I popped Codex’s code from the stash and asked Opus to look at it to see why the three m of the five tests Codex wrote didn’t pass. It looked at them and found four critical issues that Codex had missed. For example, it had failed to detect that my localhost uses https, so the the E2E suite’s API calls from the Vue app kept failing. Opus also found that the two passing tests were actually invalid: they checked for the existence of a div with #app and simply assumed it meant the Vue app booted successfully.

This is probably the dozenth comparison I’ve done between Codex and Opus. I think there was only one scenario where Codex performed equally well. Opus is just a much better model in my experience.


moral of the story is use both (or more) and pick the one that works - or even merge the best ideas from generated solutions. independent agentic harnesses support multi-model workflows.

I don't think that's the moral of the story at all. It's already challenging enough to review the output from one model. Having to review two, and then comparing and contrasting them, would more than double the cognitive load. It would also cost more.

I think it's much more preferable to pick the most reliable one and use it as the primary model, and think of others as fallbacks for situations where it struggles.


you should always benchmark your use cases and you obviously don't review multiple outputs; you only review the consensus.

see how perplexity does it: https://www.perplexity.ai/hub/blog/introducing-model-council


I was going to mention Yegge's recent blog posts mirroring this phenomena.

There's also this article on hbr.org https://hbr.org/2026/02/ai-doesnt-reduce-work-it-intensifies...

This is a real thing, and it looks like classic addiction.


Being on a $200 plan is a weird motivator. Seeing the unused weekly limit for codex and the clock ticking down, and knowing I can spam GPT 5.2 Pro "for free" because I already paid for it.

That 95% payout only works if you already know what good looks like. The sketchy part is when you can't tell the diff between correct and almost-correct. That's where stuff goes sideways.

It's 95% if you're using it for the stuff it's good at. People inevitably try to push it further than that (which is only natural!), and if you're operating at/beyond the capability frontier then the success rate eventually drops.

Just need to point out that the payout is often above 95% at online casinos. As long as it's below 100 the house still wins.

He means a slot machine that pays you 95% of the time, not a slot machine that pays out 95% of what you put in.

Claude Code wasting my time with nonsense output one in twenty times seems roughly correct. The rest of the time it's hitting jackpots.


thanks, that steve yegge piece was a very good read.

> It's more like being hooked on a slot machine which pays out 95% of the time because you know how to trick it

Right but the <100% chance is actually why slot machines are addictive. If it pays out continuously the behaviour does not persist as long. It's called the partial reinforcement extinction effect.


> It's more like being hooked on a slot machine which pays out 95% of the time because you know how to trick it.

“It’s not like a slot machine, it’s like… a slot machine… that I feel good using”

That aside if a slot machine is doing your job correctly 95% of the time it seems like either you aren’t noticing when it’s doing your job poorly or you’ve shifted the way that you work to only allow yourself to do work that the slot machine is good at.


Rather than let results be random, iteratively and continuously add more and more guardrails and grounding.

Tests, linting, guidance in response to key events (Claude Code hooks are great for this), automatically passing the agent’s code plan to another model invocation then passing back whatever feedback that model has on the plan so you don’t have to point out the same flaws in plans over and over.. custom scripts that iterate your codebase for antipatterns (they can walk the AST or be regex based - ask your agent to write them!)

Codify everything you’re looping back to your agent about and make it a guardrail. Give your agent the tools it needs to give itself grounding.

An agent without guardrails or grounding is like a person unconnected to their senses: disconnected from the world, all you do is dream - in a dream anything can happen, there’s nothing to ensure realism. When you look at it that way it’s a miracle coding agents produce anything useful at all :)


If you are trying to build something well represented in the training data, you could get a usable prototype.

If you are unfamiliar with the various ways that naive code would fail in production, you could be fooled into thinking generated code is all you need.

If you try to hold the hand of the coding agents to bring code to a point where it is production ready, be prepared for a frustrating cycle of models responding with ‘Fixed it!’ while only having introduced further issues.


How are we still citing the (excellent) METR study in support of conclusions about productivity that its authors rightly insist[0] it does not support?

My paraphrase of their caveats:

- experts on their own open source proj are not representative of most software dev

- measuring time undervalues trading time for effort

- tools are noticeably better than they were a year ago when the study was conducted

- it really does take months of use to get the hang of it (or did then, less so now)

Before you respond to these points, please look at the full study’s treatment of the caveats! It’s fantastic, and it’s clear almost no one citing the study actually read it.

[0]: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...


It's very tempting to agree to the 'gambling' part, given that both a jackpot and progress towards the goal in your project will give you a hit of dopamine.

The difference is that in gambling 'the house always wins', but in our case we do make progress towards our goal of conquering the world with our newly minted apps.

The situation where this comparison holds is when vibe coding leads nowhere and you don't accomplish anything but just burn through tokens.


if a message board that allows the sharing of videos is addictive (facebook, tiktok), then LLMs are 100% addictive. and by the same retard logic books are addictive. its hysteria, and just like people REALLY BELIEVED TV ROTS YOUR BRAINS people REALLY BELIEVE AI SLOP ROTS YOUR BRAINS.

watch as the hysteria passes, and just like the tv scare, nobody cares anymore in roughly 20 years or so. shame on all of you


Sounds like you've had too much TV. It really does rot your brain, this is obvious to anybody who doesn't watch TV, but completely imperceptible to those who do.

> It really does rot your brain, this is obvious to anybody who doesn't watch TV, but completely imperceptible to those who do.

how do you block video on your PC? or do you literally mean audiovisual information broadcast onto actual television sets is the evil?


When you watch television, or television on your computer screen (that makes no difference) you get hypnotized by the tube into a passive state of consumption. Watch people when they watch TV. Watch their slack jawed faces when the commercials stay on and their attention stays glued to the advertisements pitching Alzheimer drugs. Critical though suspended, minds off in space.

In short, read a book.


> The difference is that in gambling 'the house always wins', but in our case we do make progress towards our goal of conquering the world with our newly minted apps.

What? Your vibe coded slop is just going to be competing with someone else's vibe coded slop.


The motivations for wanting to make the slop could be commercial profit, or it could be simply you trying to solve a problem for yourself. In either case, the slop is the goal and, if the agent isn't giving you complete trash, you should be converging towards your goal. The gambling analogy doesn't work.

what kind of lame parties is the bluesky poster going to? is this a San Francisco thing?

Yeah, honestly seems like that guy is looking for a scapegoat to blame for himself being lame. If you can't put work down and let loose, that's a you problem, not a technology problem.

I certainly hope the Bsky post is satire, but I honestly can't tell anymore.

I think that in a world where code has zero marginal cost (or close to zero, for the right companies), we need to be incredibly cognizant of the fact that more code is not more profit, nor is it better code. Simpler is still better, and products with taste omit features that detract from vision. You can scaffold thousands of lines of code very easily, but this makes your codebase hard to reason about, maintain, and work in. It is like unleashing a horde of mid-level engineers with spec documents and coming back in a week with everything refactored wrong. Sure you have some new buttons but does anyone (or can any AI agent, for that matter) understand how it works?

And to another point: work life balance is a huge challenge. Burnout happens in all departments, not just engineering. Managers can get burnout just as easily. If you manage AI agents, you'll just get burnout from that too.


I don't think gambling is the right analogy at all.

I do think it can be addictive, but there are many things that are addictive that aren't gambling.

I think a better analogy is something like extreme sport, where people can get addicted to the point it can be harmful.


At least with gambling, there's the chance of hitting a jackpot.

What's with the lack of capitalisation at the start of sentences? It makes it hard to parse where one sentence ends and the next begins.

why arent books accused of addictive engineering? simply move the printed words from paper to digital and it becomes addictive somehow.

I wrote a similar blogpost just a few days ago: "The Vibe Coding Slot Machine" (https://news.ycombinator.com/item?id=47022282)

Probably the best we can hope for at the moment is a reduction in the back-and-forth, increase in ability to one-shot stuff with a really good spec. The regular human work then becomes building that spec, in regular human (albeit AI-assisted) ways.

Is the "back and forth" thing normal for AI stuff, then? Because every time I've attempted to use Claude or Copilot for coding stuff, it's been completely unable to do anything on its own, and I've ended up writing all of the code while it's just kind of introduced misspellings into it.

Maybe someone can show me how you're supposed to do it, because I have seen no evidence that AI can write code at all.


Very much normal yes. This is why I've been (so far) still mainly sticking to having it as an all-knowing oracle telling me what I need to know, which it mostly does successfully.

When it works for pure generation it's beautiful, when it doesn't it's ruinous enough to make me take two steps back. I'll have another go at getting with all the pure agentic rage everyone's talking about soon enough.


Step 1: deposit money into an Anthropic API account

Step 2: download Zed and paste in your API Key

Step 3: Give detailed instructions to the assistant, including writing ReadMe files on the goal of the project and the current state of the project

Step 4: stop the robot when it's making a dumb decision

Step 5: keep an eye on context size and start a new conversation every time you're half full. The more stuff in the context the dumber it gets.

I spent about 500 dollars and 16 hours of conversation to get an MVP static marketplace [0], a ruby app that can be crawled into static (and js-free!) files, without writing a single line of code myself, because I don't know ruby. This included a rather convoluted data import process, loading the database from XML files of a couple different schemas.

Only thing I had to figure out on my own was how to upload the 140,000 pages to cloudflare free tier.

[0] https://motorcycledealer.com/


> Step 4: stop the robot when it's making a dumb decision

Yeah I can't stop myself when I'm about to make a dumb decision, just look at my github repo. I ported Forth to a 1980s sampler and wrote DSP code on an 8-bit Arduino.

How am I going to stop a robot making dumb decisions?

Also, this all sounds like I'm doing a lot of skivvy work typing stuff in (which I hate) and not actually writing much code (which is the bit I like).


The robot will output text like “Oh, I see, the user wants me to make a Lovecraftian horror with asynchronous subprocess calls instead of HTTP endpoints, so I better suggest we reinstall the dependencies that are already installed so we can sacrifice this project to Mammoth”

It is at this point where you can say “NONONO YOU ABSOLUTE DONKEY stop that we just want a FastAPI endpoint!!” And it will go “You’re absolutely right, I was over complicating this!”


Correct.

I did waste about 20 minutes trying to do a recursive link following crawl (to write each rendered page to file), because Opus wanted to write a ruby task to do it. It wasn’t working so I googled it and found out link following is a built in feature of cURL…


Step 1 is where Anthropic lost me.

1. If you don't use it soon enough, they keep it (shame on them, do the things you need to in order to be a money transmitter, you have billions of dollars)

2. Pay-go with billing warning and limits. You can use Claude like this through Google VertexAI


there's a lot of back and forth for describing what you actually want, design, the constraints, and testing out the feeback loops you set up for it to be able to tell tell if its on the right track or not.

when its actually writing code its pretty hands off, unless you need to course correct to point it in a better direction


One of my recent thoughts is that Claude Code has become the most successful agent partially because it is more of a black box than previous implementations of the agent pattern: the actual code changes aren't shoved in your face like Cursor (used to be), they are hidden away. You focus more on the result rather than the code building up that result, and so you get into the "just one more feature" mindset a lot more, because you're never concerned that the code you're building is sloppy.

It's because claude code will test its work and adjust the code accordingly. The old way, with the way Cursor used to be or the way I used to copy and paste code from ChatGPT doesn't work because to iterate in towards a working solution requires too much human effort, making the whole thing pointless.

Cursor & its various clones (Cline, Roo Cline/Code) did that too, before Claude Code was even released.

This is simply yet another outdated analogy from haters that are failing to keep pace with the current frontier because they are too busy getting high on the anti-hype.

We’re well past the need to retry the same prompt multiple times in order to get working code. The models with their harnesses are properly agentic now, they can find the right context, make a plan, write the code, run the tests and fix the bugs with little to no intervention from a human.

The hardest part now is keeping up with them when it comes to approving the deliverables and updating the architecture and spec as new things are discovered by using the software. Not new bugs but corrections to your own assumptions you had before the feature was built.

The hard part is almost entirely management.

That’s something to seriously think about.


> This is simply yet another outdated analogy from haters that are failing to keep pace with the current frontier because they are too busy getting high on the anti-hype.

Touché.


After actually using LLM coding tools for a while, some of these anti-LLM thinkpieces feel very contrived. I don’t see the comparison to gambling addiction at all. I understand how someone might believe that if they only view LLMs through second hand Twitter hot takes and think that it’s a process of typing a prompt and hoping for the best. Some people do that, but the really effective coders work with the LLM and drive the coding, writing some or much of the code themselves. The social media version of vibe coding where you just prompt continuously and hope for the best is not going to work in any serious endeavor where details matter. We see claims of it in some high profile examples like OpenClaw, but even OpenClaw has maintainers and contributors who look at the code and make decisions. It’s also riddled with security problems as a result of the YOLO coding style.

And a related issue that if you have a coding plan with time based limits they there is pressure to make maximum use of it

Simple fix for this. When the work day is done, close the laptop and walk away. Don't link notifications to personal devices. Whatever slop it produced will be waiting for you at 8am the next morning.

Not sure if the original post is satire, but it reads like an alcoholic's submission to https://xkcd.com/1227/

Ironically the linked text by this Kellog guy is 100% AI slop itself

what's ironic? The conversation under the post adds a lot of color to Tim's thoughts

It's ironic that the linked thoughts in the post (the black and white text screenshots) lamening AI overtaking our lives are themselves written by AI

wait wait wait the BlueSky post is not a parody? It's actually serious???

I really cannot tell


Pathetic.

Funemployed right now joyously spending way way more time than 996, pulling the slot machine arm to get tokens, having a ball.

But that's for personal pleasure. This post is receeding from the concerns about "token anxiety," about the addiction to tokens. This post is about work culture & late capitalism anxiety, about possible pressures & systems society might impose.

I reflect a lot on AI doesn't reduce the work, it intensifies it. https://hbr.org/2026/02/ai-doesnt-reduce-work-it-intensifies... The spirit of this really nails something core to me. We coders especially get help doing so much of menial now. Which means we spend a lot more time making intense analysis and critiques, are much more doing the hard thought work of 'is what we have here as good as it can be'. Finding new references or patterns to feed back into the AI to steer already working implementations towards better outcomes.

And my heart tells me that corporations & work life as we know it are almost universally just really awful about supporting reflective contemplative work like this. Work wants output. It doesn't want you sit in a hammock and think about it. But increasingly I tell you the key to good successful software is Hammock Driven Development. It's time to use our brains more, in quiet reflection. https://github.com/matthiasn/talk-transcripts/blob/master/Hi...

996 sounds like garbage on its own, as a system of toil. But I also very much respect an idea of continuous work, one that also intersperses rest throughout the day. Doing some chores or going to the supermarket or playing with the kid can be an incredibly good way to let your preconscious sift through the big gnarly problems about. The response to the intensity of what we have, to me, speaks of a need to spread out the work, to de-concentrate it, to build in more than hammock time. I was on the fence about whether the traditional workday deserved to survive before AI hit, and my feels about it being a gross mismatch have massively intensified since.

As I started my post with, I personally have a much more positive experience, with what yes feels like a token addiction. But it doesn't feel like an anxiety. It feels like the greatest most exciting adventure, far beyond what I had hoped for in life ever. This is wildly fun, going far far further out than I had ever hoped to get to see. I'm not "anxiously" pulling the lever arm on the token machine, I'm just thrilled to get to do it. To have time to reflect and decide, I have 3-8 things going at once (and probably double they back burnered but open, on Niri rows!) to let myself make slower decisions, to analyze, while keeping the things that can safely move forwards moving forwards.

That also seems like something worker exploitative late capitalism is mostly hot garbage at too! Companies really try to reduce in flight activities. Sprint planning is about crafting deliberate work. But our freedom and agency here far outstrips these dusty old practices. It is anxiety inducing to be so powerful so capable & to have a bureaucracy that constraints and confines, that wants only narrow windows of our use.

Also, shame on Tim Kellogg for not God damned linking the actual post he was citing. Garbagefire move. https://writing.nikunjk.com/p/token-anxiety https://news.ycombinator.com/item?id=47021136


> 996 sounds like garbage on its own, as a system of toil. But I also very much respect an idea of continuous work, one that also intersperses rest throughout the day. Doing some chores or going to the supermarket or playing with the kid can be an incredibly good way to let your preconscious sift through the big gnarly problems about.

I _kind_ of get this if we're talking about working on big, important, world-changing problems. If it's another SaaS app or something like that, I find it pretty depressing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: