And this is probably going to get even worse the more automatic classification is used to promote or silence content.
A pretty interesting result of this is what I'd call "TikTok speak", where words are replaced, either by similar sounding ones ("porn" => "corn", often times just the corn emoji) or by neologisms ("to kill" => "to unalive"), in the hope of getting around the filters.
This turns natural language on the internet into even more of a moving target than it already used to be.
The most interesting thing is imo that people often say one thing, but put a similar-sounding word or a homophone in the subtitles, and the filter seems to trust the user-supplied subtitles.
I hope nobody trains speach-to-text systems on a tiktok dataset.
A pretty interesting result of this is what I'd call "TikTok speak", where words are replaced, either by similar sounding ones ("porn" => "corn", often times just the corn emoji) or by neologisms ("to kill" => "to unalive"), in the hope of getting around the filters.
This turns natural language on the internet into even more of a moving target than it already used to be.