The paper from the same group last year on the topic is here: http://rsif.royals...

They are using random forest out-of-sample error as a metric but doing feature selection before this step (see table 6).

As far as I can make out from a quick reading they are essentially making the error described here: http://www-stat.stanford.edu/%7Etibs/sta306b/cvwrong.pdf

On a sample size of only 42 people, overfitting seems very likely.