GPT Overperformance over Humans in Cognitive Reframing of Negative Scenarios

wongarsu · on April 21, 2024

I certainly didn't have "GPT-4 scores higher on empathy than actual humans" on my bingo card. That's quite impressive, even if the task played to GPT's strengths, and it competed against people being paid $12/h to fill out studies on prolific.

ben_w · on April 21, 2024

Neither did I, but I ought to have due to Moravec's paradox, how effective even ELIZA was at that, and how cute animals affect us.

Grimblewald · on April 22, 2024

I dont think it scores higher neccesarily, but it certainly has a more narrow distribution than humans. If the average of the training corpus is closer to the average human than a given random human then it will generally produce more relatable content. If you have a look at the spread of ratings between gpt and human youll see quite heavy tails on human evaluations, while for gpt theyre very light. This is natural, we click with some and dislike others. Why? Because with some the direction our bias takes us from the mean mean we are quite alike and for others we are very distant and cannot relate to one another. Gpt relates to all equally (well/badly) so while i think gpt could be better than a randomly selected person, a carefully selected person would be hard to beat.

zeroonetwothree · on April 21, 2024

It seems to be sensitive to exactly how it was scored. Humans did better at certain types of tasks and GPT better at other types. And either way I wouldn’t say this is “empathy”

wongarsu · on April 21, 2024

It wasn't empathy, but it did generate reframings that human reviewers scored higher on the metric "this rethinking is empathic". So it was better at generating the impression of empathy. Which is the same standard we generally apply to humans when judging their empathy, even if it is subtly wrong.

aebtebeten · on April 21, 2024

> "all watched over / by machines of loving grace" —RGB

hedora · on April 21, 2024

I guess this is good news for the Voight-Kampff test.

https://bladerunner.fandom.com/wiki/Voight-Kampff_test

Grimblewald · on April 21, 2024

I think what what helps here is that gpt4 will be closer to the average human than a random human and so its responses will, on average, be more relatable. I think that when youre paired with tge right human and your biases are both in the same direction from the mean that synergistic effect wont be beatable by an LLM, but it doesnt surprise me that it will out perform humans at being the best on average. Heck, id wager gpt3 would be better as well.

debo_ · on April 21, 2024

"Humans are, on average, even bigger assholes than computers"