Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GPT Overperformance over Humans in Cognitive Reframing of Negative Scenarios (osf.io)
44 points by CharlesW on April 21, 2024 | hide | past | favorite | 9 comments


I certainly didn't have "GPT-4 scores higher on empathy than actual humans" on my bingo card. That's quite impressive, even if the task played to GPT's strengths, and it competed against people being paid $12/h to fill out studies on prolific.


Neither did I, but I ought to have due to Moravec's paradox, how effective even ELIZA was at that, and how cute animals affect us.


I dont think it scores higher neccesarily, but it certainly has a more narrow distribution than humans. If the average of the training corpus is closer to the average human than a given random human then it will generally produce more relatable content. If you have a look at the spread of ratings between gpt and human youll see quite heavy tails on human evaluations, while for gpt theyre very light. This is natural, we click with some and dislike others. Why? Because with some the direction our bias takes us from the mean mean we are quite alike and for others we are very distant and cannot relate to one another. Gpt relates to all equally (well/badly) so while i think gpt could be better than a randomly selected person, a carefully selected person would be hard to beat.


It seems to be sensitive to exactly how it was scored. Humans did better at certain types of tasks and GPT better at other types. And either way I wouldn’t say this is “empathy”


It wasn't empathy, but it did generate reframings that human reviewers scored higher on the metric "this rethinking is empathic". So it was better at generating the impression of empathy. Which is the same standard we generally apply to humans when judging their empathy, even if it is subtly wrong.


> "all watched over / by machines of loving grace" —RGB


I guess this is good news for the Voight-Kampff test.

https://bladerunner.fandom.com/wiki/Voight-Kampff_test


I think what what helps here is that gpt4 will be closer to the average human than a random human and so its responses will, on average, be more relatable. I think that when youre paired with tge right human and your biases are both in the same direction from the mean that synergistic effect wont be beatable by an LLM, but it doesnt surprise me that it will out perform humans at being the best on average. Heck, id wager gpt3 would be better as well.


"Humans are, on average, even bigger assholes than computers"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: