Memorization is storing data. Generalization is developing the heuristics by which you compress stored data. To distill knowledge is to apply heuristics to lossily-compress a large amount of data to a much smaller amount of data from which you nevertheless can recover enough information to be useful in the future.
I did not mean to imply compression implies generalization, if anything the reverse. Compression is the act of cutting, generalization is the whetstone by which you may sharpen a blade, which is the compression heuristic. A more general heuristic is to compression what a sharper blade is to cutting.