low to moderate quality digital text work is now almost free! This is going to r...

falcor84 · 2025-06-09T12:30:28 1749472228

There's of course also the issue that an increasing fraction of web content reading is being done by AI agents. I wonder what the Pareto front here is.

iwontberude · 2025-06-09T12:53:19 1749473599

No one has successfully rebutted that paper about stochastic collapse of AI models which happens when models train on their own output over time. It’s just a matter of time before we find out if it was right or not.

throwaway314155 · 2025-06-09T23:48:58 1749512938

There are dozens if not hundreds of papers (by major research labs) showing that training on synthetic data for LLM's actually improves their performance. For instance, much of the RLHF done by MS/Facebook likely used data generated by an LLM. DeepSeek has also seen similar accusations thrown their way.

I believe the paper you're referencing was narrowly discussing text to image models and didn't incorporate the notion of prompt engineering and good old fashioned search to improve the quality of synthetic data.

it's been awhile though, so i could be wrong. effectively i'm saying it's not quite as simple as that and isn't necessarily some unsolvable doomsday clock for all LLM's.