I think it's worth it, although it might be best to wait for the next iteration:...

I think it's worth it, although it might be best to wait for the next iteration: there's rumors the M4 Macs will support up to 512GB of memory [1].

The current 128GB (e.g. M3 Max) and 192GB (e.g. M2 Ultra) Macs run these large models. For example on the M2 Ultra, the Qwen 110B model, 4-bit quantized, gets almost 10 t/s using Ollama [2] and other tools built with llama.cpp.

There's also the benefit of being able to load different models simultaneously which is becoming important for RAG and agent-related workflows.

[1] https://www.macrumors.com/2024/04/11/m4-ai-chips-late-2024/ [2] https://ollama.com/library/qwen:110b