Copilot Copyright Debate
This cluster focuses on debates about whether GitHub Copilot violates software licenses like GPL by training on public repositories, regurgitating code snippets, and issues of fair use and copyright infringement.
Activity Over Time
Top Contributors
Keywords
Sample Comments
Anything posted to Stack Overflow has a specific (Creative Commons IIRC) license associated with it. The same is not true of GitHub Copilot, and in fact their FAQ doesn’t specify a license at all, probably because they are technically unable to since it is trained on a wide variety of code from differing licenses (and code not written by a human is currently a grey area for copyright). The FAQ simply says to use it at your own risk.
How much if this is due to someone ripping off GPL code and stuffing it in a repo under a different license that got fed to copilot training?
Copilot is not a competing solution, it's a knowledge base about text, like encyclopedia. As for snippets it produces, those might be copyrightable if they pass the copyrightability threshold. If it provides you kilobytes of text at once, that would be bad. A middle ground would be Copilot tracking how much code under incompatible licenses it pasted and stop at, say, 200 LOC.
Copilot was not only trained on permissively licensed code. It’s trained on all public repos, even if the code is copyrighted (which is the default absent a more permissive license)
I mean I'm not an expert but it's a valid point as people share code under a given license, and as far as I'm aware Copilot does not make this knowledge available. Nothing to do with the fact that Copilot is an amazing technological achievement.If I, as a human, go to a public repository on Github and copy/paste a non-trivial 200 line code snippet into my proprietary code base I have to abide by the license of that original code, even if I slightly modify it. I don't
Copilot generally (excepting rare cases where it produces snippets verbatim) does not steal code. The GPL restricts distribution, not usage. And (to my knowledge) no open-source license restricts learning from code. I cannot see anyone who doesn't want others to learn from their code ever release code as open-source.
There is no obvious violation or obvious not violation. It is a matter of fair use and it will be settled in court. Using copywritten code and not open souring the derivative work (copilot's model) may very well be a violation.
Can you copy 10 lines of code from a open source project in your software? Yes you an, it's considered fair use. Nobody will ever sue for that. If it was, websites like StackOverflouw where developers post code probably taken by project with some restrictive license and other developer copy it in their projects would not exist.Copilot will not write an entire software module, it will provide you with snippets. I see using GPL code for training fair use. If a developer reads the source co
Can't they just say the code was randomly generated by Copilot so copyright and stuff like that doesn't apply?
Without using Copilot, I can search the Internet for help, find some source available code, take it verbatim and place it in my project. For most licenses, this is a violation if I ever externally ship a product from this code, and for some licenses, it is even a violation if I only expose it as a service over the Internet. A succinct way to rephrase this might be "theft".Using Copilot, I can do the same (GitHub acknowledge that according to their testing, verbatim reproduction happ