More

__jl__ · 2026-03-05T20:54:36 1772744076

What a model mess!

OpenAI now has three price points: GPT 5.1, GPT 5.2 and now GPT 5.4. There version numbers jump across different model lines with codex at 5.3, what they now call instant also at 5.3.

Anthropic are really the only ones who managed to get this under control: Three models, priced at three different levels. New models are immediately available everywhere.

Google essentially only has Preview models! The last GA is 2.5. As a developer, I can either use an outdated model or have zero insurances that the model doesn't get discontinued within weeks.

strongpigeon · 2026-03-05T21:00:06 1772744406

> Google essentially only has Preview models! The last GA is 2.5. As a developer, I can either use an outdated model or have zero insurances that the model doesn't get discontinued within weeks.

What's funny is that there is this common meme at Google: you can either use the old, unmaintained tool that's used everywhere, or the new beta tools that doesn't quite do what you want.

Not quite the same, but it did remind me of it.

fhrow4484 · 2026-03-05T21:06:31 1772744791

https://static0.anpoimages.com/wordpress/wp-content/uploads/...

yieldcrv · 2026-03-05T21:53:11 1772747591

Preview Road (only choice, and last preview was deprecated without warning)

hdjrudni · 2026-03-06T08:04:50 1772784290

If the last preview was 'deprecated', it's still usable. So you have two choices.

Peeve of mine when people say 'deprecated' but really they mean 'discontinued' or 'deleted'.

Things don't instantly disappear when they're deprecated.

yieldcrv · 2026-03-06T09:40:17 1772790017

Take it up with the organizations that use deprecated and break things immediately

goodmythical · 2026-03-05T23:15:02 1772752502

where's my nightly road?

Who knows, I might arrive before I depart.

CactusBlue · 2026-03-05T22:06:23 1772748383

Reminds of Unity features

tymscar · 2026-03-06T00:58:07 1772758687

I still remember the massive shift to SDRP and HDRP. Honestly, now in retrospect, almost a decade later, I think it was clearly done wrong. It was a mess, and switching over was a multi-week procedure for anything more than a hello world program, and what you got in return wasn’t something that looked better, just something that had the potential to.

Similar story with the whole networking stack. I haven’t used Unity in years now after it being my main work environment for years, but the sour taste it left in my mouth by moving everything that worked in the engine into plugins that barely worked will forever remain there.

Im sure its partly skill issue

fireant · 2026-03-06T03:12:21 1772766741

Don't forget that some of the new features are mutually incompatible. For example couple years ago you couldn't use the "new ui system" with the "new input system" even when both were advertised as ready/almost ready

peab · 2026-03-05T23:33:24 1772753604

such a great meme

madeofpalk · 2026-03-05T22:52:32 1772751152

oh is this about my workplace?

L-four · 2026-03-05T21:15:32 1772745332

Gmail was in beta for 5 years, until 2009.

metalliqaz · 2026-03-05T21:43:58 1772747038

"Gemini, translate 'beta' from Googlespeak to English."

"Ok, here is the translation:"

    'we don't want to offer support'

cyanydeez · 2026-03-05T21:50:29 1772747429

Nah, it's "We dont want to provide a consistent model that we'll be stuck with supporting for a decade because it just takes up space; until we run everyone out of business, we can't afford to have customers tying their systems to any given model"

Really, the economics makes no sense, but that's what they're doing. You can't have a consistent model because it'll pin their hardware & software, and that costs money.

msikora · 2026-03-06T00:04:33 1772755473

I have a service that relies on NanoBanana Pro, but the availability has been so atrocious that we just might go back to OpenAI.

solarkraft · 2026-03-05T21:58:47 1772747927

Just like any Google product then.

kfse · 2026-03-06T04:00:51 1772769651

Until it had backup storage. Which ended up being useful in 2011 when tens of thousands of mailboxes were deleted due to a software bug and needed to be recovered from tape...

jsemrau · 2026-03-06T02:42:18 1772764938

It was a different company back then. The Internet was still new-ish and not the multi-trillion dollar company it is now. I'd think expectations are different.

cyanydeez · 2026-03-05T21:47:38 1772747258

The business models of LLMs don't include any garuntee, and some how that's fine for a burgeoning decade of trillions of dollars of consumption.

Sure, makes total sense guys.

m_fayer · 2026-03-05T21:20:57 1772745657

My 5ish years in the mines of Android native back in the day are not years I recall fondly. Never change, Google.

jakub_g · 2026-03-05T21:05:41 1772744741

"Everything is beta or deprecated."

Aurornis · 2026-03-05T21:29:12 1772746152

> What a model mess! OpenAI now has three price points: GPT 5.1, GPT 5.2 and now GPT 5.4.

I don't know, this feels unnecessarily nitpicky to me

It isn't hard to understand that 5.4 > 5.2 > 5.1. It's not hard to understand that the dash-variants have unique properties that you want to look up before selecting.

Especially for a target audience of software engineers skipping a version number is a common occurrence and never questioned.

IgorPartola · 2026-03-06T01:23:00 1772760180

The issue isn’t 5.4 > 5.2 etc. It is that there is a second dimension which is the model size and a third dimension which is what it is tuned for. And when you are releasing so quickly that flagship your instant mini model is on one numerical version but your flagship tool calling mini model is on another it is confusing trying to figure out which actual model you want for your use case.

It’s not impossible to figure out but it is a symptom of them releasing as quickly as possible to try to dominate the news and mindshare.

Aurornis · 2026-03-06T14:14:34 1772806474

> The issue isn’t 5.4 > 5.2 etc. It is that there is a second dimension which is the model size and a third dimension which is what it is tuned for.

All 3 models are tuned for general purpose work.

Model size isn’t how you pick which model to use. You pick based on performance in evals compared to price.

It’s not hard to imagine that the more expensive models are probably larger or having higher compute requirements.

Melatonic · 2026-03-05T22:09:24 1772748564

Agreed - and its a huge step up from their previous naming schemes. That stuff was confusing as hell

__jl__ · 2026-03-05T22:27:46 1772749666

I see your point. I do find Anthropic's approach more clean though particularly when you add in mini and nano. That makes 5 models priced differently. Some share the same core name, others don't: gpt 5 nano, gpt 5 mini, gpt 5.1, gpt 5.2, gpt 5.4. And we are not even talking about thinking budget.

But generally: These are not consumer facing products and I agree that someone who uses the API should be able to figure out the price point of different models.

Reebz · 2026-03-06T02:25:03 1772763903

I don’t agree that it’s a nitpick - it’s a fundamental communication tool to users that describes capabilities and costs. Versioning is not the problem, but it amplifies the mess.

To be more direct on the point: Anthropic has nailed that Opus > Sonnet > Haiku.

com2kid · 2026-03-06T02:46:39 1772765199

> To be more direct on the point: Anthropic has nailed that Opus > Sonnet > Haiku.

Holy cow I never realized and I had to keep checking which model was which, I never had managed to remember which model was which size before because I never realized there was a theme with the names!

Aurornis · 2026-03-06T14:17:26 1772806646

> To be more direct on the point: Anthropic has nailed that Opus > Sonnet > Haiku.

How is this more clear than 5.4 > 5.2 > 5.1?

OpenAI used familiar numeric versioning instead of clever word names. Normally this choice would appeal to software devs, not gather criticism.

bibimsz · 2026-03-06T22:30:06 1772836206

I assume 5.4 is just the latest version. So if I'm on 5.1, I need to plan to upgrade to the latest version. I may assume the pricing is roughly the same, as well as the speed, and the purpose.

If I'm on Haiku, I don't assume I need to upgrade to Opus soon. I use Haiku for fast low reasoning, and Opus for slower more thoughtful answers.

And if I'm on Sonnet 4.5 and I see Sonnet 4.6 is coming out, I can reasonably assume it's more of a drop in upgrade, rather than a different beast.

Reebz · 2026-03-07T13:36:56 1772890616

Prod model suite: GPT-5.4, GPT-5.4Thinking, GPT-5.4Pro, GPT-5.3-Codex, GPT-5.3-Instant, GPT-5.2, GPT-5mini, GPT5-nano, GPT-4.1mini GPT-4o(Omni), o4-mini, o4-mini-high.

Devoid of logic and structure.

They can't even decide where to place hyphens: is it GPT-5.4 Pro or GPT-5.3-Codex?

jbonatakis · 2026-03-05T22:26:42 1772749602

Google is already sending notices that the 2.5 models will be deprecated soon while all the 3.x models are in preview. It really is wild and peak Google.

abrookewood · 2026-03-06T03:07:51 1772766471

Public Service Announcement!! I don't know why the hell google do this, but when the deprecate a model, the error you will see is a Rate Limit error. This has caught me out before and it is super annoying.

weird-eye-issue · 2026-03-06T04:16:04 1772770564

Do you mean when they remove a model you get that error? Because deprecation means it will be removed in the future but you can still use it

abrookewood · 2026-03-06T05:16:18 1772774178

Yes, sorry - you are correct. Once removed, that's the error, which is incredibly confusing. I spent way too long troubleshooting usage when 2.0 was removed before I figured it out.

weird-eye-issue · 2026-03-06T08:45:54 1772786754

Yes it should be a 404 error because most apps have retry logic on rate limit errors

boringg · 2026-03-05T22:47:18 1772750838

Like building on quicksand for dependencies. I guess though the argument is that the foundation gets stronger over time

bethekidyouwant · 2026-03-05T23:11:37 1772752297

What dependancy could possibly be tied to a non deterministic ai model? Just include the latest one at your price point.

jbonatakis · 2026-03-05T23:18:57 1772752737

Well it’s not even performance (define that however you will), but behavior is definitely different model to model. So while whatever new model is released might get billed as an improvement, changing models can actually meaningfully impact the behavior of any app built on top of it.

npn · 2026-03-06T04:15:39 1772770539

the problem the price point is increasing sharply every time.

gemini 2 flash lite was $0.3 per 1Mtok output, gemini 2.5 flash lite is $0.4 per 1Mtok output, guess the pricing for gemini 3 flash lite now.

yes you guess it right, it is $1.5 per 1Mtok output. you can easily guest that because google did the same thing before: gemini 2 flash was $0.4, then 2.5 flash it jumps to $2.5.

and that is only the base price, in reality newer models are al thinking models, so it costs even more tokens for the sample task.

at some point it is stopped being viable to use gemini api for anything.

and they don't even keep the old models for long.

deaux · 2026-03-06T03:39:23 1772768363

There's a whole universe of tasks that aren't "fix a Github issue" or even related to coding in the slightest. A large number of those tasks doesn't necessarily get better with model updates. In many cases, the performance is similar but with different behavior so you have to rewrite prompts to get the same. In some cases the performance is just worse. Model updates usually only really guarantee to be better at coding, and maybe image understanding.

0xbadcafebee · 2026-03-05T21:19:45 1772745585

> or have zero insurances that the model doesn't get discontinued within weeks

Why are you using the same model after a month? Every month a better model comes out. They are all accessible via the same API. You can pay per-token. This is the first time in, like, all of technology history, that a useful paid service is so interoperable between providers that switching is as easy as changing a URL.

phainopepla2 · 2026-03-05T21:52:11 1772747531

If you're trying to use LLMs in an enterprise context, you would understand. Switching models sometimes requires tweaking prompts. That can be a complete mess, when there are dozens or hundreds of prompts you have to test.

mr-pink · 2026-03-06T00:12:24 1772755944

sounds like job security. be careful what you wish for before you get automated

bethekidyouwant · 2026-03-05T23:13:20 1772752400

This sounds made up. Much like “prompt engineering” Let’s hear an actual example

Koffiepoeder · 2026-03-06T00:50:00 1772758200

We have an OCR job running with a lot of domain specific knowledge. After testing different models we have clear results that some prompts are more effective with some models, and also some general observations (eg, some prompts performed badly across all models).

Sample size was 1000 jobs per prompt/model. We run them once per month to detect regression as well.

mistercheph · 2026-03-06T02:56:05 1772765765

While I believe that performance varies with respect to prompt, I have a seriously hard time believing that using the same prompt that was effective with the previous model would perform worse with the next generation of the same model from that lab and the same prompt.

deaux · 2026-03-06T03:49:17 1772768957

You shouldn't have a hard time believing it. There are thousands of different domains out there. You find it hard to believe that any of them would perform worse in your scenario?

Labs are still really optimizing for maybe 10 of those domains. At most 25 if we're being incredibly generous.

And for many domains, "worse" can hardly be benched. Think about creative writing. Think about a Burmese cooking recipe generator.

bethekidyouwant · 2026-03-06T15:12:14 1772809934

Bruh, how do you evaluate a batch of 1000 jobs against a x model for creative writing or cooking recipes? It’s vibes all the way down. This reeks like some kind of blog spam seo nonsense.

deaux · 2026-03-06T17:28:15 1772818095

The entire point is that you _don't_ for creative writing, vibes are the whole point, and those vibes often get worse across model updates for the same prompts.

gwd · 2026-03-06T00:19:08 1772756348

OK, so a while back I set up a workflow to do language tagging. There were 6-8 stages in the pipeline where it would go out to an LLM and come back. Each one has its own prompt that has to be tweaked to get it to give decent results. I was only doing it for a smallish batch (150 short conversations) and only for private use; but I definitely wouldn't switch models without doing another informal round of quality assessment and prompt tweaking. If this were something I was using in production there would be a whole different level of testing and quality required before switching to a different model.

0xbadcafebee · 2026-03-06T01:00:15 1772758815

The big providers are gonna deprecate old models after a new one comes out. They can't make money off giant models sitting on GPUs that aren't taking constant batch jobs. If you wanna avoid re-tweaking, open weights are the way. Lots of companies host open weights, and they're dirt cheap. Tune your prompts on those, and if one provider stops supporting it, another will, or worst case you could run it yourself. Open weights are now consistently at SOTA-level at only a month or two behind the big providers. But if they're short, simple prompts, even older, smaller models work fine.

mcint · 2026-03-06T00:03:42 1772755422

Enterprises moving slow, or preferring to remain on old technology that they already know how to work...is received wisdom in hn-adjacent computing, a truism known and reported for more than 3 decades (5 decades since the Mythical Man-Month).

Sounds like someone who's responsible, on the hook, for a bunch of processes, repeatable processes (as much as LLM driven processes will be), operating at scale.

Just in the open, tools like open-webui bolts on evals so you can compare: how different models, including new ones, perform on the tasks that you in particular care about.

Indeed LLM model providers mainly don't release models that do worse on benchmarks—running evals is the same kind of testing, but outside the corporate boundary, pre-release feedback loop, and public evaluation.

https://chatgpt.com/share/69aa1972-ae84-800a-9cb1-de5d5fd7a4...

weird-eye-issue · 2026-03-06T04:17:40 1772770660

Tell us more about how you've never actually used these APIs in production

laichzeit0 · 2026-03-06T03:43:39 1772768619

Like, bro, do you think 5.x is a drop in replacement for 4.1? No it obviously wasn’t, since it had reasoning effort and verbosity and no more temperature setting, etc.

There’s no way you can switch model versions without testing and tweaking prompts, even the outputs usually look different. You pin it on a very specific version like gpt-5.2-20250308 in prod.

hobofan · 2026-03-05T22:12:08 1772748728

That's true only in theory, but not in practice. In practice every inference provider handles errors (guardrails, rate limits) somewhat differently and with different quirks, some of which only surface in production usage, and Google is one of the worst offenders in that regard.

abrookewood · 2026-03-06T03:09:48 1772766588

Because switching models requires testing, validation and shipping to Prod. Bloody annoying when the earlier model did everything I need and we are talking about a hobby project. I don't want to touch it every month - it's the same reason people use the LTS version of operating systems etc.

CobrastanJorji · 2026-03-05T21:46:19 1772747179

> Google essentially only has Preview models.

It's really nice to see Google get back to its roots by launching things only to "beta" and then leaving them there for years. Gmail was "beta" for at least five years, I think.

FINDarkside · 2026-03-05T22:48:52 1772750932

Also, GCP Cloud Run domain mapping, pretty fundamental feature for cloud product, has been in "preview" for over 5 years now.

jsmith99 · 2026-03-06T08:26:55 1772785615

It's still unavailable in many regions.

embedding-shape · 2026-03-05T21:11:35 1772745095

> OpenAI now has three price points: GPT 5.1, GPT 5.2 and now GPT 5.4.

I guess that's true, but geared towards API users.

Personally, since "Pro Mode" became available, I've been on the plan that enables that, and it's one price point and I get access to everything, including enough usage for codex that someone who spends a lot of time programming, never manage to hit any usage limits although I've gotten close once to the new (temporary) Spark limits.

jijji · 2026-03-06T02:49:13 1772765353

I tried to use Google's Gemini CLI from the command line on linux and I think it let me type in two sentences and then it told me that I was out of credits... and then I started reading comments that it would overwrite files destructively [0] or worse just try to rewrite an entire existing codebase [1]. it just doesn't sound ready for prime time. I think they wanted to push something out to compete with Claude code but it's just really really bad.

[0] https://github.com/google-gemini/gemini-cli/issues/17583

[1] https://www.reddit.com/r/Bard/comments/1l8vil5/gemini_keeps_...

fnordpiglet · 2026-03-06T04:26:24 1772771184

5.4 is the one fine tuned for autonomous mass murder, automated surveillance state, and money grabs at any cost. It’s really hard to lump that into the others as it’s a fairly unique and specialized feature set. You can’t really call it that tho so they have to use the numbers.

I’m pretty glad I’m out of the OpenAI ecosystem in all seriousness. It is genuinely a mess. This marketing page is also just literally all over the place and could probably be about 20% of its size.

beklein · 2026-03-05T23:02:50 1772751770

Not sure why you think Anthropic has not the same problems? Their version numbers across different model lines jump around too... for Opus we have 4.6, 4.5, 4.1 then we have Sonnet at 4.6, 4.5, and 4.1? No version 4.1 here, and there is Haiku, no 4.6, but 4.5 and no 4.1, no 4 but then we only have old 3.5...

Also their pricing based on 5m/1h cache hits, cash read hits, additional charges for US inference (but only for Opus 4.6 I guess) and optional features such as more context and faster speed for some random multiplier is also complex and actually quiet similar to OpenAI's pricing scheme.

To me it looks like everybody has similar problems and solutions for the same kinds of problems and they just try their best to offer different products and services to their customers.

selcuka · 2026-03-06T00:02:20 1772755340

With Anthropic you always have 3 models to choose from: Opus-latest, Sonnet-latest, and Haiku-latest, from the best/slowest to the worst/fastest.

The version numbers are mostly irrelevant as afaik price per token doesn't change between versions.

maxo99 · 2026-03-06T00:18:35 1772756315

Three random names isn't ideal. I'm often need to double check which is which. This is why we use numbers

dseravalli · 2026-03-06T00:35:47 1772757347

They aren't random. Opus's are very long poems, haikus are very short ones (3 lines), sonnets are in between (~14 lines)

oliwary · 2026-03-06T06:02:45 1772776965

What's next? Claude Iliad?

echoangle · 2026-03-06T00:33:35 1772757215

How are the names random?

https://en.wikipedia.org/wiki/Masterpiece

https://en.wikipedia.org/wiki/Sonnet

https://en.wikipedia.org/wiki/Haiku

They dropped the magnum from opus but you could still easily deduce the order of the models just from their names if you know the words.

svachalek · 2026-03-05T23:44:33 1772754273

It's much more consistent. Only 3 lines, numbered 4.6, 4.6, and 4.5, and it's clear they're tiers and not alternate product lines. It wasn't until recently that GPT seems to have any kind of naming convention at all and it's not intuitive if every version number is a whole different class of tool.

The pricing is more complex but also easy, Opus > Sonnet > Haiku no matter how you tweak those variables.

biophysboy · 2026-03-05T21:58:10 1772747890

Wow, is that what preview means? I see those model options in github copilot (all my org allows right now) - I was under the impression that preview means a free trial or a limited # of queries. Kind of a misleading name..

snug · 2026-03-05T23:58:52 1772755132

Pretty common to call something that isn't ready a preview

awad · 2026-03-05T23:03:29 1772751809

Incredibly curious how Google's approach to support, naming, versioning etc will mesh with the iOS integration.

abustamam · 2026-03-05T23:36:10 1772753770

I mean, Google notoriously discontinues even non-beta software, so if your concern is that there's insurance that the model doesn't get discontinued, then you may as well just use whatever you want since GA could also get discontinued.

raincole · 2026-03-05T21:32:17 1772746337

They aggressively retire models, so GPT 5.1 and 5.2 are probably going to go soon.

hobofan · 2026-03-05T22:15:55 1772748955

In the Azure Foundry, they list GPT 5.2 retirement as "No earlier than 2027-05-12" (it might leave OpenAIs normal API earlier than that). I'm pretty certain that Gemini 3, which isn't even in GA yet will be retired earlier than that.

arthurcolle · 2026-03-05T20:56:51 1772744211

There is a lot of opportunity here for the AI infrastructure layer on top of tier-1 model providers

motoxpro · 2026-03-05T21:32:10 1772746330

This is what clouds like AWS, Azure, and GCP solve (vertex AI, etc). They are already an abstraction on top of the model makers with distribution built in.

I also don't believe there is any value in trying to aggregate consumers or businesses just to clean up model makers names/release schedule. Consumers just use the default, and businesses need clarity on the underlying change (e.g. why is it acting different? Oh google released 3.6)

arthurcolle · 2026-03-05T22:13:18 1772748798

Do the end users really care about the models at all, or about the effects that the models can cause?

m3kw9 · 2026-03-05T21:52:46 1772747566

thats how they had it for years, is a mess, but controlled

delaminator · 2026-03-05T20:59:51 1772744391

two great problems in computing

naming things

cache invalidation

off by one errors

rurban · 2026-03-05T22:27:40 1772749660

Biggest problem right now in computing:

Out of tokens until end of month

CamperBob2 · 2026-03-06T01:59:07 1772762347

More like, "Out of DRAM until end of world"

__jl__ · 2026-02-19T16:32:55 1771518775

Another preview release. Does that mean the recommended model by Google for production is 2.5 Flash and Pro? Not talking about what people are actually doing but the google recommendation. Kind of crazy if that is the case

__jl__ · 2026-02-05T18:16:28 1770315388

Impressive jump for GPT-5.3-codex and crazy to see two top coding models come out on the same day...

granzymes · 2026-02-05T18:20:41 1770315641

Insane! I think this has to be the shortest-lived SOTA for any model so far. Competition is amazing.

__jl__ · 2026-01-24T01:31:50 1769218310

Yes you can and I really like it as a feature. But it ties you to OpenAI…

__jl__ · 2025-12-17T17:47:56 1765993676

I will have to try that. Cursor bill got pretty high with Opus 4.5. Never considered opus before the 4.5 price drop but now it's hard to change... :)

diamondfist25 · 2025-12-17T18:40:04 1765996804

$100 Claude max is the best subscription I’ve ever had.

Well worth every penny now

vanviegen · 2025-12-18T08:29:24 1766046564

Or a $40 GitHub copilot plan also gets you a lot of Opus usage.

onoesworkacct · 2025-12-18T23:50:27 1766101827

Missing a lot without claude code tho

vanviegen · 2025-12-21T22:46:55 1766357215

I've tried both, and I'm still not sure. Claude Code steers more towards a hands-off, vibe coding approach, which I often regret later. With Copilot I'm more involved, which feels less 'magical' and takes me more time, but generally does not end in misery.

__jl__ · 2025-12-17T16:58:25 1765990705

This is awesome. No preview release either, which is great to production.

They are pushing the prices higher with each release though: API pricing is up to $0.5/M for input and $3/M for output

For comparison:

Gemini 3.0 Flash: $0.50/M for input and $3.00/M for output

Gemini 2.5 Flash: $0.30/M for input and $2.50/M for output

Gemini 2.0 Flash: $0.15/M for input and $0.60/M for output

Gemini 1.5 Flash: $0.075/M for input and $0.30/M for output (after price drop)

Gemini 3.0 Pro: $2.00/M for input and $12/M for output

Gemini 2.5 Pro: $1.25/M for input and $10/M for output

Gemini 1.5 Pro: $1.25/M for input and $5/M for output

I think image input pricing went up even more.

Correction: It is a preview model...

mips_avatar · 2025-12-17T17:16:02 1765991762

I'm more curious how Gemini 3 flash lite performs/is priced when it comes out. Because it may be that for most non coding tasks the distinction isn't between pro and flash but between flash and flash lite.

KoolKat23 · 2025-12-17T22:34:17 1766010857

Token usage also needs to be factored in specifically when thinking is enabled, these newer models find more difficult problems easier and use less tokens to solve.

srameshc · 2025-12-17T17:05:50 1765991150

Thanks that was a great breakup of cost. I just assumed before that it was the same pricing. The pricing probably comes from the confidence and the buzz around Gemini 3.0 as one of the best performing models. But competetion is hot in the area and it's not too far where we get similar performing models for cheaper price.

YetAnotherNick · 2025-12-17T18:18:16 1765995496

For comparison, GPT-5 mini is $0.25/M for input and $2.00/M for output, so double the price for input and 50% higher for output.

AuthError · 2025-12-17T18:21:33 1765995693

flash is closer to sonnet than gpt minis though

martythemaniak · 2025-12-17T19:02:52 1765998172

The price increase sucks, but you really do get a whole lot more. They also had the "Flash Lite" series, 2.5 Flash Lite is 0.10/M, hopefully we see something like 3.0 Flash Lite for .20-.25.

sunaookami · 2025-12-17T18:36:57 1765996617

This is a preview release.

reed1234 · 2025-12-18T02:29:08 1766024948

https://openrouter.ai/google/gemini-3-flash-preview

uluyol · 2025-12-17T17:22:30 1765992150

Are these the current prices or the prices at the time the models were released?

__jl__ · 2025-12-17T17:45:30 1765993530

Mostly at the time of release except for 1.5 Flash which got a price drop in Aug 2024.

Google has been discontinuing older models after several months of transition period so I would expect the same for the 2.5 models. But that process only starts when the release version of 3 models is out (pro and flash are in preview right now).

misiti3780 · 2025-12-17T18:44:11 1765997051

is there a website where i can compare openai, anthropic and gemini models on cost/token ?

jsnell · 2025-12-17T19:52:10 1766001130

There are plenty. But it's not the comparison you want to be making. There is too much variability between the number of tokens used for a single response, especially once reasoning models became a thing. And it gets even worse when you put the models into a variable length output loop.

You really need to look at the cost per task. artificialanalysis.ai has a good composite score, measures the cost of running all the benchmarks, and has 2d a intelligence vs. cost graph.

misiti3780 · 2025-12-17T21:13:27 1766006007

thanks

deaux · 2025-12-18T05:22:15 1766035335

For reference the above completely depends on what you're using them for. For many tasks, the number of tokens used is consistent within 10~20%.

deaux · 2025-12-18T05:22:58 1766035378

https://www.helicone.ai/llm-cost

Tried a lot of them and settled on this one, they update instantly on model release and having all models on one page is the best UX.

rrhartjr · 2025-12-18T03:07:32 1766027252

https://www.llm-prices.com/

int_19h · 2025-12-17T22:49:43 1766011783

https://openrouter.ai/models

__jl__ · 2025-11-18T15:30:38 1763479838

API pricing is up to $2/M for input and $12/M for output

For comparison: Gemini 2.5 Pro was $1.25/M for input and $10/M for output Gemini 1.5 Pro was $1.25/M for input and $5/M for output

raincole · 2025-11-18T15:54:43 1763481283

Still cheaper than Sonnet 4.5: $3/M for input and $15/M for output.

brianjking · 2025-11-18T15:57:03 1763481423

It is so impressive that Anthropic has been able to maintain this pricing still.

bottlepalm · 2025-11-18T18:08:46 1763489326

Claude is just so good. Every time I try moving to ChatGPT or Gemini, they end up making concerning decisions. Trust is earned, and Claude has earned a lot of trust from me.

Honestly Google models have this mix of smart/dumb that is scary. Like if the universe is turned into paperclips then it'll probably be Google model.

int_19h · 2025-11-18T23:07:36 1763507256

Well, it depends. Just recently I had Opus 4.1 spend 1.5 hours looking at 600+ sources while doing deep research, only to get back to me with a report consisting of a single sentence: "Full text as above - the comprehensive summary I wrote". Anthropic acknowledged that it was a problem on their side but refused to do anything to make it right, even though all I asked them to do was to adjust the counter so that this attempt doesn't count against their incredibly low limit.

epolanski · 2025-11-18T19:22:35 1763493755

Idk Anthropic has the least consistent models out there imho.

Aeolun · 2025-11-18T16:32:04 1763483524

Because every time I try to move away I realize there’s nothing equivalent to move to.

Alex-Programs · 2025-11-18T16:52:53 1763484773

People insist upon Codex, but it takes ages and has an absolutely hideous lack of taste.

sumedh · 2025-11-19T07:51:37 1763538697

It creates beautiful websites though.

andybak · 2025-11-18T17:25:33 1763486733

Taste in what?

js4ever · 2025-11-18T23:18:47 1763507927

Wines!

jhack · 2025-11-18T15:53:10 1763481190

With this kind of pricing I wonder if it'll be available in Gemini CLI for free or if it'll stay at 2.5.

xnx · 2025-11-18T16:53:03 1763484783

There's a waitlist for using Gemini 3 for Gemini CLI free users: https://docs.google.com/forms/d/e/1FAIpQLScQBMmnXxIYDnZhPtTP...

eevmanu · 2025-11-18T18:08:16 1763489296

In case anyone wants to confirm if this link is official, it is.

https://goo.gle/enable-preview-features

-> https://github.com/google-gemini/gemini-cli/blob/release/v0....

--> https://goo.gle/geminicli-waitlist-signup

---> https://docs.google.com/forms/d/e/1FAIpQLScQBMmnXxIYDnZhPtTP...

dktp · 2025-11-18T22:55:15 1763506515

It's interesting that grounding with search cost changed from

* 1,500 RPD (free), then $35 / 1,000 grounded prompts

to

* 1,500 RPD (free), then (Coming soon) $14 / 1,000 search queries

It looks like the pricing changed from per-prompt (previous models) to per-search (Gemini 3)

fosterfriends · 2025-11-18T16:36:28 1763483788

Thrilled to see the cost is competitive with Anthropic.

hirako2000 · 2025-11-18T15:37:47 1763480267

[flagged]

mupuff1234 · 2025-11-18T15:41:10 1763480470

I assume the model is just more expensive to run.

hirako2000 · 2025-11-18T16:05:13 1763481913

Likely. The point is we would never know.

__jl__ · 2025-11-18T15:30:03 1763479803

API pricing is up to $2/M for input and $12/M for output

For comparison: Gemini 2.5 Pro was $1.25/M for input and $10/M for output Gemini 1.5 Pro was $1.25/M for input and $5/M for output

__jl__ · 2025-11-18T14:05:52 1763474752

Same here. They have been aggressively increasing prices with each iteration (maybe because they started so low). Still hope that is not the case this time. GPT 5.1 is priced pretty aggressively so maybe that is an incentive to keep the current gemini API prices.

Deathmax · 2025-11-18T14:50:13 1763477413

Bad news then, they've bumped 3.0 Pro pricing to $2/$12 ($4/$18 at long context).

__jl__ · 2025-11-13T22:17:02 1763072222

The prompt caching change is awesome for any agent. Claude is far behind with increased costs for caching and manual caching checkpoints. Certainly depends on your application but prompt caching is also ignored in a lot of cost comparisons.

pants2 · 2025-11-13T22:31:30 1763073090

Though to be fair, thinking tokens are also ignored in a lot of cost comparisons and in my experience Claude generally uses fewer thinking tokens for the same intelligence