GPT-4 (and Claude) are definitely the top models out there, but: Llama, even the...

shoelessone · on Sept 3, 2024

Can you explain a bit more about what "serverless GPUs" are exactly? Is there a specific cloud provider you're thinking of, e.g. is there a GPU product with AWS? Google gives me SageMaker, which is perhaps what you are referring to?

ozr · on Sept 3, 2024

There are a few companies out there that provide it, Runpod and Replicate being the two that I've used. If you've ever used AWS Lambda (or any other FaaS) it's essentially the same thing.

You ship your code as a container within a library they provide that allows them to execute it, and then you're billed per-second for execution time.

Like most FaaS, if your load is steady-state it's more expensive than just spinning up a GPU instance.

If your use-case is more on-demand, with a lot of peaks and troughs, it's dramatically cheaper. Particularly if your trough frequently goes to zero. Think small-scale chatbots and the like.

Runpod, for example, would cost $3.29/hr or ~$2400/mo for a single H100. I can use their serverless offering instead for $0.00155/second. I get the same H100 performance, but it's not sitting around idle (read: costing me money) all the time.

agcat · on Sept 3, 2024

You can check out this technical deep dive on Serverless GPUs offerings/Pay-as-you-go way.

This includes benchmarks around cold-starts, performance consistency, scalability, and cost-effectiveness for models like Llama2 7Bn & Stable Diffusion across different providers -https://www.inferless.com/learn/the-state-of-serverless-gpus... .Can save months of your time. Do give it a read.

P.S: I am from Inferless.