That isn't what that research says at all. What that research says is that running the same training data through multiple times improves training. There is still an ideal model size though, it is just impacted by the total volume of training data.
https://arxiv.org/pdf/1912.02292
"We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better."
That is the first sentence of the abstract. The first graph shown in the paper backs it up.
Looking into it further, it seems that typical LLMs are in the first descent regime anyway though so my original point is not too relevant for them anyway it seems. Also it looks like the second descent region doesn't always reach a lower loss than the first, it appears to depend on other factors as well.