Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> What way too many people claim, however, is that the machine isn't even allowed to look at GPL'ed code for some reason, while humans are.

Why would those be the same thing? It's a matter of scale. Just like how people are allowed to read websites, but scraping is often disallowed.



> Just like how people are allowed to read websites, but scraping is often disallowed.

Hosting code on Github explicitly allows this type of usage (scraping) according to their TOS so I have to ask again - why the sudden complains?

Are we still talking about a shortcoming of the ML model, which very occasionally spits out a few lines of copied code or should we include search engines into this, because they do the exact same thing by design?

robots.txt, for example, has a non-binding, purely advisory character as well and Common Crawl [0] (also used for training GPT-3) publishes a dataset that by definition contains GPL'ed code as well, no matter where it's hosted. So is that off-limits now, too?

[0] http://commoncrawl.org




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: