If you self-host, you likely won't have anywhere near enough volume to do effici...

If you self-host, you likely won't have anywhere near enough volume to do efficient batching, and end up bottlenecked on memory rather than compute.

E.g. based on the calculations in https://www.tensoreconomics.com/p/llm-inference-economics-fr..., increasing batch size from 1 to 64 cuts the cost per token to 1/16th.