IME tok/s is only useful with the additional context of ttft and total latency. ...

Rapzid · 2026-03-17T22:49:05 1773787745

Exactly. Really frustrating they don't advertise TTFT and etc, and that it's really hard to find any info in that regard on newer models.

For voice agents gpt-4.1 and gpt-4.1-mini seem to be the best low latency models when you need to handle bigger data more complex asks.

But they are a year old and trying to figure out if these new models(instant, chat, realtime, mini, nona, wtf) are a good upgrade is very frustrating. AFAICT they aren't; the TTFT latencies are too high.