Could this be the beginning of the true test of open source licenses? My underst...

jcelerier · on July 8, 2021

What ? There have been plenty of GPL cases defended in court.

https://en.m.wikipedia.org/wiki/Open_source_license_litigati...

laurowyn · on July 8, 2021

All of the copyright cases were settled, so no precedence is set. Open source as a contract has been ruled legal, and licensors can sue for breach of contract - which is not the same as copyright infringement.

I think my point still stands.

jefftk · on July 8, 2021

GitHub used code that wasn't under any license at all, just publicly visible. Their claim is not that the license allows what they're doing, but that they do not need a license.

laurowyn · on July 8, 2021

which is a different issue to my point, but still very valid. what terms are implied if no license is specified? I would argue attribution should be expected if used, but I also wouldn't go near any code without a specific license attached as there's no express permission given - just because a license isn't disclosed doesn't mean it isn't there.

you can't go copying anything and everything just because nobody has told you that you can't. and I feel that's part of the purpose behind GPL. force a license on derivative code so that at least there's clear rights moving forwards.

jefftk · on July 8, 2021

It's stronger than that: if GitHub is correct that they don't need a license then they are allowed to train on publicly visible code even if it is labeled with "no one has any provision to use this for anything at all, especially training models"

laurowyn · on July 8, 2021

Which is why I think this could be a big turning point. IMO, GitHub is breaking licenses. If an ML algorithm ingests a viral licensed block of code, its outputs should be tainted with that license as it's a derived work. Otherwise I can make a program reproduce whole repositories license free, so long as I can claim "well, the AI did it, not me!" It's produced something based on the original work, therefore it should follow the license of the original. And that issue is exacerbated by the mixture of licenses available - they will all apply at the same time, and not all are compatible.

I would hope GitHub (and Microsoft) did the legal work to cover this, and not just ploughed ahead with the plan to drown any legal challenges. From my perspective, they're doing the latter.

jefftk · on July 9, 2021

This isn't as clear as most things we work on as engineers, but there's a spectrum:

* An algorithm (or person) ingesting lots of code and then later spitting out that same input, does not free anyone from the copyrights of the input.

* An algorithm (or person) that ingests lots of code, finds commonalities, synthesizes that into something new, and produces something well beyond mere copying is producing something new, likely without any legal tie to the original.

Right now, it looks like most of what co-pilot does is closer to the latter, but sometimes it does some things that are closer to the former? I can't see any reason why they wouldn't be able to fix it to avoid regurgitating its input, however, with something like a bloom filter, so I expect a long-term there's a way to do it that falls entirely within fair use?