Wouldn't this require larger datasets? That isn't always an option. I'm imaginin... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		imh on Sept 28, 2015 \| parent \| context \| favorite \| on: Natural Language Basics with TextBlob Wouldn't this require larger datasets? That isn't always an option. I'm imagining that a smaller, more computationally efficient network could learn nearly as well with fewer data points given these heavily engineered features. Is that off base?

nl on Sept 28, 2015 [–]

Basically, no. See http://karpathy.github.io/2015/05/21/rnn-effectiveness/

He gets pretty amazing results with a corpus size around 10M.

imh on Sept 28, 2015 | [–]

But that takes ages to train!

nl on Sept 29, 2015 | | [–]

So something like Jason Weston's state-of-the-art attention-NN based sentence summarizer took ~4 days to train.

You'd easily spend that time doing manual feature engineering just to build a baseline system.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact