Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> there is a subset of embarrassingly parallel problems that are are heavily data intensive.

There's an even smaller subset which is one-shot data access.

> in examples like this where the time required to read the data and send the data over the network is much larger than the time required to compute the data once the data is in memory

The annoying thing about lambda and other functional alternatives is that data-access patterns tend to be repetitive in somewhat predictable ways & there is no way to take advantage of that fact easily.

However, if you don't have that & say you were reading from S3 for every pass, then lambda does look attractive because the container lifetime management is outsourced - but if you do have even temporal stickiness of data, then it helps to do your own container management & direct queries closer to previous access, rather than to entirely cold instances.

If there's a thing that hadoop missed out on building into itself, it was a distributed work queue with functions with slight side-effects (i.e memoization).



> If there's a thing that hadoop missed out on building into itself, it was a distributed work queue with functions with slight side-effects (i.e memoization).

Is that not named spark ? :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: