The idea that an AI lab would pay a small army of human artists to create training data for $animal on $transport just to cheat on my stupid benchmark delights me.
They were caught using all the data on the internet without asking for permission or compensating anyone. And it has cost them nothing and earned them billions so far.
Vetting them for the potential for whistleblowing might be a bit more involved. But conspiracy theories have an advantage because the lack of evidence is evidence for the theory.
Huh? AI labs are routinely spending millions to billions to various 3rd party contractors specializing in creating/labeling/verifying specialized content for pre/post-training.
This would just be one more checkbox buried in hundreds of pages of requests, and compared to plenty of other ethical grey areas like copyright laundering with actual legal implications, leaking that someone was asked to create a few dozen pelican images seems like it would be at the very bottom of the list of reputational risks.
How do you think who's in on that? Not only pelicans, I mean, the whole thing. CEOs, top researchers, select mathematicians, congressmen? Does China participate in maintaining the bubble?
I, myself, prefer the universal approximation theorem and empirical finding that stochastic gradient descent is good enough (and "no 'magic' in the brain", of course).
Well, since we're all talking about sourcing training material to "benchmaxx" for social proof, and not litigating the whole "AI bubble" debate, just the entire cottage industry of data curation firms:
I think no matter what happens with AI in the future, there will always be a subset of people with elaborate conspiracies about how it's all fake/a hoax.
I'm not saying it's a hoax. If it gets better because of that data, tant mieux, but we have to be clear eyed about what these models are actually doing. Especially when companies don't explain what they've done.