Openai has a "flex" processing tier, which works like the normal API, but where ...

Openai has a "flex" processing tier, which works like the normal API, but where you accept higher latency and higher error rates, in exchange for 50% off (same as batch pricing). It also supports prompt caching for further savings.

For me, it works quite well for low-priority things, without the hassle of using the batch API. Usually the added latency is just a few seconds extra, so it would still work in an agent loop (and you can retry requests that fail at the "normal" priority tier.)

https://developers.openai.com/api/docs/guides/flex-processin...