Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Probability matching is hardly what I would call a deterministic algorithm. What the authors likely means is that we should use a probabilistic algorithm rather than a random one.

Nitpicking aside, I disagree with the usage of the original quote:

> β€œIt appears to be a quite general principle that, whenever there is a randomized way of doing something, then there is a nonrandomized way that delivers better performance but requires more thought.” E.T. Jaynes

What randomized methods are good at is reducing bias. In our case, there are several sources of bias: - The population that upvotes links on the 'new' page is different than the one that upvotes links on the frontpage - The probability of upvotes (which I assume are either ratios of upvotes to impressions or the ratio of upvotes to clicks) itself is determined by several factors other than article quality: article type and headline quality (and there is another interesting debate about quality vs. popularity, since sensationalized headlines are likely to get more clicks) to cite the main culprits, but there are many others.

I have three problems with this article:

1. The proposed strategy answers the question of variance but not bias, which is where the problem differs from a simple multi-armed bandit problem. Now we could build sophisticated models that try to take all these into account and get an unbiased estimator of quality, but it's likely be a pursuit that's not worth the payoff.

2. Adding random noise to the rankings is likely to be much more efficient. The author here suggests computing the posterior probability of an article being better than another for each pair of articles, which seems rather impractical.

3. Most importantly: is overall quality of the frontpage really the ultimate optimization objective? This solution doesn't solve the problem of the N+1th article receiving much fewer votes than the Nth, where N is the number of articles shown on the frontpage, while being arguably equally deserving. I would argue that the problem of minimizing the probability of a good article getting few pageviews is actually the more important one, because we have more good articles than slots on the frontpage. A randomized algorithm would be much better equipped for this purpose.

Of course, the other thing is that for bragging purposes "We made the frontpage of HN" sounds much better than "We made the frontpage of HN for 60% of the users". But that's a different problem altogether...

Edit: writing this comment made me think of alternative solutions. An easy-to-implement one would be to add 5 or so slots at the bottom of the frontpage that would contain random picks from the 2nd or 3rd page in a rotating manner. That would give good articles which weren't fortunate enough to get good visibility a chance at floating up to the top, without all the implications of changing the ranking algorithm itself. In other words, add fuzziness to the threshold instead of the ranking.



I have little dispute with everything you've written.

However, one place I'd argue with you strongly is that handling beliefs about the world should be part of your model. For example, you mention different upvote probabilities on the new and front page - I'd simply model that with a new parameter beta (possibly related to alpha in some way).

It's far from clear to me how randomization such as what Dan Luu proposed actually addresses this at all.

I'll also discuss one other point: The author here suggests computing the posterior probability of an article being better than another for each pair of articles, which seems rather impractical.

It probably is. However, suppose we went through the effort of computing posteriors, only to discover that we get stuck at the stage of computing pairwise probabilities. There is still a very good chance we can draw samples from the posteriors.

Even though our deterministic algorithm has failed, we still get a much better random algorithm out of the exercise. Given the posteriors on P(good article), we can draw samples from them and employ Thompson Sampling to sort the articles. (I.e., draw a sample from each posterior and then order the articles by increasing sampled posterior.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: