Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At least to some first approximation irrelevant because reading code is not subject to any license. What if a human reads some restrictively licensed code and years later uses some idea he noticed in that code, maybe even no longer being aware from where this idea comes?

But what if the system memorizes entire functions? What if a human does so? What if you change all the variable names? What if you rearrange the control flow a bit? What if you just change the spacing? What if two humans write the exact same code independently? Is every for loop with i from 0 to n a license violation?

I am not picking any side, but the problem is certainly much more nuanced then either side of the argument wants to paint it.



I agree that it's nuanced and it's difficult to draw the line. but where copilot sits is way over on the plagiarizing side of the spectrum. Wherever we agree to draw the line, copilot should definitely fall on the wrong side of it

Copilot will replicate entire functions, including comments, from licensed code


> but where copilot sits is way over on the plagiarizing side of the spectrum

I think it is important to point out that not all Copilot output is on the plagiarizing side of the spectrum. However it does on occasion produce plagiarized code. And most importantly there is no indication when this occurs.


> What if a human reads some restrictively licensed code and years later uses some idea he noticed in that code, maybe even no longer being aware from where this idea comes?

In general using the idea is fine, whether it is AI or human written. I think the major concern here is when the code is copied verbatim, or near verbatim. (AKA the produced code is not "transformative" upon the original)

> But what if the system memorizes entire functions? What if a human does so?

In both of these cases I believe it would be a copyright concern. It is not strictly defined, and it depends on the complexity of the function. If you memorized (|a| a + 1) I doubt any court would call that copying a creative work. But if you memorized the quake fast inverse square root it is likely protected under copyright, even if you changed the variable names and formatting.

It seems clear to me that GitHub Copilot is capable of producing code that is copyrighted and needs to be used according to the copyright owner's license. Worse still, it doesn't appear of capable of knowing when it is doing that, and what the source is.


The problem is that humans are limited in retention and rate of learning. An AI/ML is not, which makes (or should make) a difference.


Sure, it might certainly be the case that different rules should be applied to humans and machines, but this makes the discussion only even more nuanced. But I don't think this could reasonably be used to ban machines from ingesting code with certain licenses even though it might restrict what they can do with this information.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: