Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could this be the beginning of the true test of open source licenses? My understanding is that there has never been a ruling by a court to give precedence to the validity or scope of any open source license. I can see a class action suit coming on behalf of all GPL licensed code authors.


What ? There have been plenty of GPL cases defended in court.

https://en.m.wikipedia.org/wiki/Open_source_license_litigati...


All of the copyright cases were settled, so no precedence is set. Open source as a contract has been ruled legal, and licensors can sue for breach of contract - which is not the same as copyright infringement.

I think my point still stands.


GitHub used code that wasn't under any license at all, just publicly visible. Their claim is not that the license allows what they're doing, but that they do not need a license.


which is a different issue to my point, but still very valid. what terms are implied if no license is specified? I would argue attribution should be expected if used, but I also wouldn't go near any code without a specific license attached as there's no express permission given - just because a license isn't disclosed doesn't mean it isn't there.

you can't go copying anything and everything just because nobody has told you that you can't. and I feel that's part of the purpose behind GPL. force a license on derivative code so that at least there's clear rights moving forwards.


It's stronger than that: if GitHub is correct that they don't need a license then they are allowed to train on publicly visible code even if it is labeled with "no one has any provision to use this for anything at all, especially training models"


Which is why I think this could be a big turning point. IMO, GitHub is breaking licenses. If an ML algorithm ingests a viral licensed block of code, its outputs should be tainted with that license as it's a derived work. Otherwise I can make a program reproduce whole repositories license free, so long as I can claim "well, the AI did it, not me!" It's produced something based on the original work, therefore it should follow the license of the original. And that issue is exacerbated by the mixture of licenses available - they will all apply at the same time, and not all are compatible.

I would hope GitHub (and Microsoft) did the legal work to cover this, and not just ploughed ahead with the plan to drown any legal challenges. From my perspective, they're doing the latter.


This isn't as clear as most things we work on as engineers, but there's a spectrum:

* An algorithm (or person) ingesting lots of code and then later spitting out that same input, does not free anyone from the copyrights of the input.

* An algorithm (or person) that ingests lots of code, finds commonalities, synthesizes that into something new, and produces something well beyond mere copying is producing something new, likely without any legal tie to the original.

Right now, it looks like most of what co-pilot does is closer to the latter, but sometimes it does some things that are closer to the former? I can't see any reason why they wouldn't be able to fix it to avoid regurgitating its input, however, with something like a bloom filter, so I expect a long-term there's a way to do it that falls entirely within fair use?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: