Have you considered using Gemini? Google seems to be on a hot streak with their ...

anonyfox · 2026-03-08T17:50:54 1772992254

because gemini, despite what stats say, still produces garbage once the problem gets harder. it nails it for lab conditions, but messy reality or creativity or even code quality is a far cry from opus or the latest gpt5.4 by a long shot. and always has been. its pretty good inside the GSuite because of integrations, but standalone its near worthless compared to even grok-code-fast which doesn't think much at all (but damn it is fast). At this point google keeps throwing noodlepots with AI against every wall in reach to see what sticks, which is more kind of desperation that still works to increase wall street highscores, but not exactly a streak or breakthrough. just rapid fire shotgun launches to see if anything sticks. No one serious talks Gemini because its not even worth considering still for real things outside shiny presentations and artificial benchmarks.

baq · 2026-03-08T18:13:44 1772993624

Gemini schools the other two when doing code reviews.

I used to think tokens are a commodity, but it’s becoming clear that the jagged frontier is different enough even for the easiest use case of SWE that there’s room for having two if not three providers of different foundational models. It isn’t a winner takes all, they’re all winning together. Cursor isn’t properly taking advantage of the situation yet.

Grimblewald · 2026-03-09T02:26:32 1773023192

My experience exactly. The more "real" the problems become, the more other models become unsuitable when compared to claude, with the sole exceptions being deepseek/kimi, which while speaking strictly w.r.t metrics and basic tasks are not better, they are more interesting and handle more odd and totally out of domain stuff better than the US models. An example being code i wrote for a hypercomplex sedenion based artififial neural network broke claude so bad it start saying it is chatgpt and cant evaluate/run code. similar experience for all US models, which are characterized by being extremely brittle at the fringes, though cladue least among them. Meanwhile chinese models are less capable for cookie cutter stuff but keep swinging when things get really weird and unusual. It's like US models optimize for the lowest minima acheivable, and god help you if distribution changes. Chinese models on the otgerhand seem to optimize for the flattest minima, giving poorer quality across the board but far more robust behaviour.

igor47 · 2026-03-08T17:51:46 1772992306

I've tried. It's just not very good compared to either mentioned alternative.

siliconc0w · 2026-03-08T18:25:23 1772994323

I can't even use 3.1 with Gemini CLI, not sure why.