Back when GPT-3 was first announced I got kind of scared, and decided to download the then-current Kiwix ZIM archives of Wikipedia, Stack Overflow, Wikihow, Wikisource, and a number of other similar sites.
I'm kind of glad that I did, and intend to keep these versions "forever", as examples of pre-LLM human-generated content.
We are already seeing this with sites that pump as many prompts through SD and spam the internet with junk images. Future systems will at least have to have quality discriminators when training on these images.
I'm kind of glad that I did, and intend to keep these versions "forever", as examples of pre-LLM human-generated content.