He gets pretty amazing results with a corpus size around 10M.
You'd easily spend that time doing manual feature engineering just to build a baseline system.