Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Openai has a "flex" processing tier, which works like the normal API, but where you accept higher latency and higher error rates, in exchange for 50% off (same as batch pricing). It also supports prompt caching for further savings.

For me, it works quite well for low-priority things, without the hassle of using the batch API. Usually the added latency is just a few seconds extra, so it would still work in an agent loop (and you can retry requests that fail at the "normal" priority tier.)

https://developers.openai.com/api/docs/guides/flex-processin...

 help



That's interesting but it's a beta feature so it could go away at any time. Also not available for Codex agentic models (or Pro models for that matter).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: