An unquantized Qwen1.5-110B model would require some ~220GB of RAM, so 100+GB wo...

wongarsu · on April 26, 2024

However if you want to use the LLM in your workflow instead of just experimenting with it on its own you also need RAM to run everything else comfortably.

96GB RAM might be a good compromise for now. 64GB is cutting it close, 128GB leaves more breathing room but is expensive.

coder543 · on April 26, 2024

Yep, I agree with that.

andai · on April 26, 2024

Phi 3 Q4 spazzes out on some inputs (emits a stream of garbage), while the FP16 version doesn't (at least for the cases I could find). Maybe they just botched the quantization (I have good results with other Q4 models), but it is an interesting data point.

coder543 · on April 26, 2024

Phi 3 in particular had some issues with the end-of-text token not being handled correctly at launch, as I recall, but I could be remembering incorrectly.