These days I'm tending to prefer structural sparsity over pruning, because if yo...

These days I'm tending to prefer structural sparsity over pruning, because if you make good choices you get high-quality models that are fast on both CPU and (G/T)PU. The EfficientNet architecture (which leans heavily on seperable convolutions) is a good example of this. Using GRUs with block-diagonal matrices and some 'side channel' for communication between the blocks also works very well.

A good structurally sparse model can usually be carefully pruned as well, but the gains are a bit smaller once you've already settled into a 'small enough' model architecture.