Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These days I'm tending to prefer structural sparsity over pruning, because if you make good choices you get high-quality models that are fast on both CPU and (G/T)PU. The EfficientNet architecture (which leans heavily on seperable convolutions) is a good example of this. Using GRUs with block-diagonal matrices and some 'side channel' for communication between the blocks also works very well.

A good structurally sparse model can usually be carefully pruned as well, but the gains are a bit smaller once you've already settled into a 'small enough' model architecture.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: