Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And this is probably going to get even worse the more automatic classification is used to promote or silence content.

A pretty interesting result of this is what I'd call "TikTok speak", where words are replaced, either by similar sounding ones ("porn" => "corn", often times just the corn emoji) or by neologisms ("to kill" => "to unalive"), in the hope of getting around the filters.

This turns natural language on the internet into even more of a moving target than it already used to be.



The most interesting thing is imo that people often say one thing, but put a similar-sounding word or a homophone in the subtitles, and the filter seems to trust the user-supplied subtitles.

I hope nobody trains speach-to-text systems on a tiktok dataset.


The core of the problem is unsolvable, since any automated system can be defeated by a sufficiently motivated human.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: