Nope, they are the same, just that grokking is when the KL between the represent...

Nope, they are the same, just that grokking is when the KL between the representable information of the implicit biases and the data is extremely high (i.e. the network is poorly-designed or oriented for the task).

It's an informal term that not everyone accepts. Double-descent is acceptable as it describes a general phenomenon that is a natural consequence of a phase transition during neural network training. Grokking is like, to me, the 'fetch' of neural network terms. It's not new, it adds a seeming layer of separation from double-descent (which is is -- just very delayed), and it's not really accepted by everyone.

I personally do not like it at all. Especially because language affects _our_ implicit biases about what neural networks can and cannot do. We've already seen that their capacities and performance can be pushed way beyond what we traditionally expect of them.

But to summarize, they are the same. And this is why we need good terminology, as well, because poor adoption and boosting of improper terminology induces excess regret in the information exchange surface between agents in a game-theoretic sense in this lovely landscape of the ML world.