Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm a huge fan of using Lambda to perform hundreds of thousands of discrete tasks in a fraction of the time it'd take to perform those same tasks locally. A while back I used Lambda and SQS to cross check 168,000 addresses with my ISP's gigabit availability tool.[1] If I recall correctly each check took about three seconds, but running all 168,000 checks on Lambda only took a handful of minutes. I believe the scraper was written in Python, so I shudder to think about how long it would have taken to run on a single machine.

[1] https://dstaley.com/2017/04/30/gigabit-in-baton-rouge/



> I believe the scraper was written in Python, so I shudder to think about how long it would have taken to run on a single machine.

Scraping is an embarrassingly perfect scenario for coroutines. Most asynchronous frameworks even use scraping as one of the examples.

In short, it would probably be done in 15 minutes, assuming you don’t get throttled quickly. If the tool wasn’t already async capable, another 15 minutes to wrap some scraping in gevent/eventlet.


Even without async it's pretty easy to slap a concurrent.futures ThreadPool on something normally single threaded and get massive performance gains.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: