Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I built such a toy cluster once to see for my self and gave up. It is too slow to do anything serious. You can be much better off by just buying older post lease server. Sure it will consume more power, but conversely you will finish more tasks in shorter time, so advantage of using ARM in that case may be negligible. If it was Apple's M1 or M2, that would have been a different story though. RPi4 and clones are not there yet.


I overall think people tend to underestimate the overhead of clustering. It's always significantly faster to run a computation on one machine than spread over N machines with hardware of (1/N) power.

That's not always a viable option because of hardware costs, and sometimes you want redundancy, but those concerns are on an orthogonal axis to performance.


Lines gets blurred when you are on a supercomputer interconnect and a global address space or even rdma


the fastest practical interconnects are roughly 1/10th the speed of local RAM. Because of that, if you use interconnect, you don't use it for remote RAM (through virtual memory).

I don't think anybody in the HPC business really pursued mega-SMP after SGI because it was not cost-effective for the gains.


Both Single System Image and giant NUMA machines were and are still pursued because not everything scales in shared-nothing message passing well (some stuff straddles it by doing distributed shared memory over MPI but using it mostly for synchronisation).

It's just that there's a range of very well paying problems that scale quite well in message passing systems, and this means that even if your problem scales very badly on them, you might have easier time brute forcing the task on larger but inefficient supercomputer rather than getting funding for smaller more efficient one that fits your problems better.


Cray did some vector machines that were globally addressed but not coherent. That’s an interesting direction. So is latency hiding.

The really important thing is that the big ‘single machine’ you’re talking about already has numa latency problems. Sharing a chassis doesn’t actually save you from needing to tackle latency at scale.


Well, a complete M1 board, which is basically about as large as half an iPhone mini, is fast enough. It's also super efficient. So I'm still waiting for Apple to announce their cloud.

They're currently putting Mx chips in every device they have, even the monitors. It'll be the base system for any electric device. I'm sure we'll see more specialized devices for different applications, because at this point, the hardware is compact, fast, and secure enough for anything, as well as the software stack.

Hello Apple Fridge


Fast enough for what?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: