For what it's worth, at least for HPC-ish distributed computing, this sort of th...

zozbot234 · on May 31, 2022

The biggest benefit is arguably that codes that are designed for "telefork" and perhaps remote threads can also be scaled down to a single shared-memory machine, and run way more efficiently than if they had been coded using the MPI approach. Whilst you don't really add much of any overhead when running in a cluster, assuming that the codes are designed properly.

gnufx · on May 31, 2022

Just doing a fork may be sufficient for something embarrassingly parallel, but interesting things are tightly coupled. Obviously MPI scales down to a single node (a distributed system anyway, these days), typically as real forked processes, but possibly with all the ranks in a single processwith an appropriate implementation.

Citation needed, as they say, for “run way more efficiently”, particularly as the conventional wisdom says shared memory in a single process (e.g. OpenMP).

“Acknowledgements: ... NUMA and Amdahl’s Law, for holding OpenMP back and keeping MPI-only competitive in spite of the ridiculous cost of Send-Recv within a shared-memory domain.” — Jeff Hammond, ‘MPI+MPI’

bluedino · on May 31, 2022

I would kill for "vmotion" on our HPC cluster. Tired of draining a node then pestering the remaining users to log their jobs out.