Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

GCP lists a T4 which is suitable for inference for between $0.11/hour and $0.35/hour (depending on commitment duration and preemptibility).

https://cloud.google.com/compute/gpus-pricing



Agreed - I priced this out for a specific distributed inference task a few months ago and the T4 was cheaper and faster than GPU.

On 26 million images using a pytorch model that had 41 million parameters, T4 instances were about 32% cheaper than CPU instances, and took about 45% of the time even after accounting for extra GPU startup time.



Yes, I’m using “T4” as a shorthand for “instances otherwise matched to the CPU-based instances but which also have a T4 GPU.”


OP was most likely referring to the AWS EC2 instance type T4, which runs on Amazon Graviton processors IIRC.


> GCP lists a T4

GCP not AWS, or are you talking about a different OP?


Many businesses/services can't saturate the hardware you describe. It's just too much compute power. With CPUs you can scale down to fit your actual needs: all the way down to a single AVX-512 core doing maybe 24 inferences per second (costing a few dollars PER MONTH).

Also, your cost/inference results will change if you use a fast CPU inference engine, instead of something slow like PyTorch (which you appear to be using).


Thanks - this is something I wasn’t familiar with. Do you have any pointers for CPU inference engines that you’ve had good experience with or that I can look into further?


Disclosure: I work for Neural Magic.

Hi carbocation, we'd love to see what you think of the performance using the DeepSparse engine for CPU inference: https://github.com/neuralmagic/deepsparse

Take a look through our getting started pages that walk through performance benchmarking, training, and deployment for our featured models: https://sparsezoo.neuralmagic.com/getting-started




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: