Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The paper from the same group last year on the topic is here: http://rsif.royalsocietypublishing.org/content/8/59/842.full...

They are using random forest out-of-sample error as a metric but doing feature selection before this step (see table 6).

As far as I can make out from a quick reading they are essentially making the error described here: http://www-stat.stanford.edu/%7Etibs/sta306b/cvwrong.pdf

and elegantly described in this recent blog post: http://blog.kaggle.com/2012/07/06/the-dangers-of-overfitting...

On a sample size of only 42 people, overfitting seems very likely.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: