no this is different. it is for the base model. this is why i explain in my twee...

no this is different. it is for the base model. this is why i explain in my tweet that we just say for the base model quality we might be comparable. for instruct model, there is much room to improve especially on human eval.

i admit that the code switching is a serious problem of ours cuz it really affects the user experience of english users. but we find that it is hard for a multilingual model to get rid of this feature. we'll try to fix it in qwen2.