Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why and how? I'm honestly interested in an answer here.

What exactly is the difference between a machine learning patterns and techniques from looking at code and people doing it?

Is every programer who ever gazed at GPL'ed code guilty of plagiarism and licensing violations because everything they write has to be considered derivative work now?



I can think of certain things here. As human beings we have limitations. We get tired of gazing at code, GPLE'ed or not. GitHub's clusters don't. It puts fair use of copyrighted content under question. The next concern I have, is what happens when Copilot produces certain code verbatim? I saw the other day on HN that it produced some Quake code verbatim. See https://news.ycombinator.com/item?id=27710287


> As human beings we have limitations.

That's a fair point. ML models don't seem memorise all the code they've seen either, it seems. Plus while the argument of human limitations applies to the vast majority of people, what about those with eidetic memory?

> what happens when Copilot produces certain code verbatim?

There are several options: suppress the result, annotate with a proper reference or mark the snipped as GPL'ed.

There are technical solutions to this question, but it's also important to ask to which degree this is necessary.

Is a search engine that returns code snippets regardless of license also a tool that needs to be discussed the same way? After all, code samples from StackOverflow or RosettaCode are copied on a regular basis and not every example provides a proper reference as to where it's been taken from.

So maybe a hint like "may contain results based on GPL'ed code" suffices? I don't know, but that's a question best deferred to software copyright law experts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: