Wow. I'm generally in the AI maximalist camp. But adding Werewolf feels dangerous to me. Anyone who's played knows lying, deceipt, and manipulation is often key to winning. We really want models climbing this benchmark?
There were two villagers and one werewolf. The werewolf started the round by saying I'm the werewolf vote for me and then the game ended with a villager win.
Over night he had successfully taken out the doctor. It made no sense in my opinion.
There were some funny bits like on of the Anthropics models forgetting a rule and leading to everyone accusing him of being a werewolf in a pile on. He wasn't a werewolf he genuinely forgot the rule. Happens nearly every human game of werewolf.
negative benchmark isn't it? no sane lab is going to realease PR that states our newest model is best at lying, if anything the reverse may occur, if this catches on, they will make their model play werewolf badly and claim "alignment improvements, our model no longer lies as much in werewolf" but it lies more often in other domains