Agreed - I priced this out for a specific distributed inference task a few months ago and the T4 was cheaper and faster than GPU.
On 26 million images using a pytorch model that had 41 million parameters, T4 instances were about 32% cheaper than CPU instances, and took about 45% of the time even after accounting for extra GPU startup time.
Many businesses/services can't saturate the hardware you describe. It's just too much compute power. With CPUs you can scale down to fit your actual needs: all the way down to a single AVX-512 core doing maybe 24 inferences per second (costing a few dollars PER MONTH).
Also, your cost/inference results will change if you use a fast CPU inference engine, instead of something slow like PyTorch (which you appear to be using).
Thanks - this is something I wasn’t familiar with. Do you have any pointers for CPU inference engines that you’ve had good experience with or that I can look into further?
https://cloud.google.com/compute/gpus-pricing