GCP lists a T4 which is suitable for inference for between $0.11/hour and $0.35/...

carbocation · on Sept 10, 2021

Agreed - I priced this out for a specific distributed inference task a few months ago and the T4 was cheaper and faster than GPU.

On 26 million images using a pytorch model that had 41 million parameters, T4 instances were about 32% cheaper than CPU instances, and took about 45% of the time even after accounting for extra GPU startup time.

ml_hardware · on Sept 10, 2021

T4 is a gpu :) NVIDIA Tesla T4: https://www.nvidia.com/en-us/data-center/tesla-t4/

carbocation · on Sept 11, 2021

Yes, I’m using “T4” as a shorthand for “instances otherwise matched to the CPU-based instances but which also have a T4 GPU.”

mkaic · on Sept 10, 2021

OP was most likely referring to the AWS EC2 instance type T4, which runs on Amazon Graviton processors IIRC.

mkl · on Sept 11, 2021

> GCP lists a T4

GCP not AWS, or are you talking about a different OP?

37ef_ced3 · on Sept 10, 2021

Many businesses/services can't saturate the hardware you describe. It's just too much compute power. With CPUs you can scale down to fit your actual needs: all the way down to a single AVX-512 core doing maybe 24 inferences per second (costing a few dollars PER MONTH).

Also, your cost/inference results will change if you use a fast CPU inference engine, instead of something slow like PyTorch (which you appear to be using).

carbocation · on Sept 11, 2021

Thanks - this is something I wasn’t familiar with. Do you have any pointers for CPU inference engines that you’ve had good experience with or that I can look into further?

markurtz · on Sept 11, 2021

Disclosure: I work for Neural Magic.

Hi carbocation, we'd love to see what you think of the performance using the DeepSparse engine for CPU inference: https://github.com/neuralmagic/deepsparse

Take a look through our getting started pages that walk through performance benchmarking, training, and deployment for our featured models: https://sparsezoo.neuralmagic.com/getting-started