> My previous experience with Qwen releases is that the models also have a habit...

> My previous experience with Qwen releases is that the models also have a habit of randomly switching to Chinese for a few words. I wonder if this model is better at responding to English questions with an English response? Maybe we need a benchmark for how well an LLM sticks to responding in the same language as the question, across a range of different languages.

This is trivially resolved with a properly configured sampler/grammar. These LLMs output a probability distribution of likely next tokens, not single tokens. If you're not willing to write your own code, you can get around this issue with llama.cpp, for example, using `--grammar "root ::= [^一-鿿ぁ-ゟァ-ヿ가-힣]*"` which will exclude CJK from sampled output.