> And so for data processing/streaming/batch [...] serverless actually does work out pretty well.
This is my field of expertise. Serverless in the sense of lambda/functions is not usable for serious analytics pipelines due to the max allowed image size being smaller than the smallest NLP models or even lightweight analytics python distributions. You can't use lambda on the ETL side and you can't use lambda on the query side unless your queries are trivial enough to be piped straight through to the underlying store. And if your workload is trivial, you should just use clickhouse or straight up postgres because it vastly outperforms serverless stacks in cost and performance[1]
For non-trivial pipelines, tools like spark and dask dominate. And it just so happens that both have plugins to provision their own resources through kubernetes instead of messing around with serverless/paas noise.
IaaS is the peak value proposition of cloud vendors. Serverless/PaaS are grossly overpriced products aimed at non-technical audiences and are mostly snake oil. Change my mind.
The issue of the application artifact size is definitely real and it blocks some NLP/ML workloads for sure. Consider that a today problem that isn't hard in Lambda.
Missosoup i see you making changes to your comment and it greatly changes the tone/context. i won't adjust my own reply in suit but leave it as it was for your original comments on this.
I'm not going to make any elaborations on my comment now. Please feel free to edit yours or post another to answer anything I raised. Your original reply containing some generic sales brochures isn't what I expected from someone representing aws stepping into this discussion.
That article appears to be discussing a migration from Redshift to Clickhouse. Redshift is a managed data warehouse, not a serverless solution in the same vein as Lambda.
I don't understand the point you are trying to make.
Edit: The comment I am replying to was originally just 'Please explain' and a link to the article in question, and contained no other context or details.
Clickhouse is a really strange thing to compare to Lambda here. One is a method of performing small compute jobs, the other is an analytics database. They serve vastly different functions and saying "Clickhouse or postgres is cheaper and more performant than lambdas" is nonsensical.
This is my field of expertise. Serverless in the sense of lambda/functions is not usable for serious analytics pipelines due to the max allowed image size being smaller than the smallest NLP models or even lightweight analytics python distributions. You can't use lambda on the ETL side and you can't use lambda on the query side unless your queries are trivial enough to be piped straight through to the underlying store. And if your workload is trivial, you should just use clickhouse or straight up postgres because it vastly outperforms serverless stacks in cost and performance[1]
For non-trivial pipelines, tools like spark and dask dominate. And it just so happens that both have plugins to provision their own resources through kubernetes instead of messing around with serverless/paas noise.
And PasS products, well.
https://weekly-geekly.github.io/articles/433346/index.html
>One table instead of 90
>Service requests are executed in milliseconds
>The cost has decreased by half
>Easy removal of duplicate events
Please explain.
[1] https://blog.cloudflare.com/http-analytics-for-6m-requests-p...
IaaS is the peak value proposition of cloud vendors. Serverless/PaaS are grossly overpriced products aimed at non-technical audiences and are mostly snake oil. Change my mind.