Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does anyone have a reference for solving multi-armed bandit problems with a finite time horizon? I would like something that derives rules or heuristics for how your explore/exploit tradeoff changes as the horizon approaches.

This seems like an obvious extension, and something that someone should have worked on given how long this problem has been around, but I've been unable to find anything on it. Any pointers?



What do you mean? Most analyses of multi-armed bandit algorithms assume a finite time horizon. And if not, they use the doubling trick for infinite time horizons.


Thank you, now I realized that I had misunderstood the notation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: