An unquantized Qwen1.5-110B model would require some ~220GB of RAM, so 100+GB would not be "enough" for that, unless we put a big emphasis on the "+".
I consider "heavily" quantized to be anything below 4-bit quantization. At 4-bit, you could run a 110B model on around 55GB to 60GB of memory. Right now, Llama-3-70B-Instruct is the highest ranked model you can download[0], and you should be able to fit the 6-bit quantization into 64GB of RAM. Historically, 4-bit quantization represents very little quality loss compared to the full 16-bit models for LLMs, but I have heard rumors that Llama 3 might be so well trained that the quality loss starts to occur earlier, so 6-bit quantization seems like a safe bet for good quality.
If you had 128GB of RAM, you still couldn't run the unquantized 70B model, but you could run the 8-bit quantization in a little over 70GB of RAM. Which could feel unsatisfying, since you would have so much unused RAM sitting around, and Apple charges a shocking amount of money for RAM.
However if you want to use the LLM in your workflow instead of just experimenting with it on its own you also need RAM to run everything else comfortably.
96GB RAM might be a good compromise for now. 64GB is cutting it close, 128GB leaves more breathing room but is expensive.
Phi 3 Q4 spazzes out on some inputs (emits a stream of garbage), while the FP16 version doesn't (at least for the cases I could find). Maybe they just botched the quantization (I have good results with other Q4 models), but it is an interesting data point.
Phi 3 in particular had some issues with the end-of-text token not being handled correctly at launch, as I recall, but I could be remembering incorrectly.
I consider "heavily" quantized to be anything below 4-bit quantization. At 4-bit, you could run a 110B model on around 55GB to 60GB of memory. Right now, Llama-3-70B-Instruct is the highest ranked model you can download[0], and you should be able to fit the 6-bit quantization into 64GB of RAM. Historically, 4-bit quantization represents very little quality loss compared to the full 16-bit models for LLMs, but I have heard rumors that Llama 3 might be so well trained that the quality loss starts to occur earlier, so 6-bit quantization seems like a safe bet for good quality.
If you had 128GB of RAM, you still couldn't run the unquantized 70B model, but you could run the 8-bit quantization in a little over 70GB of RAM. Which could feel unsatisfying, since you would have so much unused RAM sitting around, and Apple charges a shocking amount of money for RAM.
[0]: https://leaderboard.lmsys.org/