AI Training Copyright Debate
This cluster centers on debates about the legality of training AI and LLM models on copyrighted materials, including arguments over fair use, derivative works, ongoing lawsuits, and whether it constitutes infringement.
Activity Over Time
Top Contributors
Keywords
Sample Comments
Can someone with actual fundamental understanding of LLMs explain to me why they think it's perfectly legal to train models on copyrighted material? I don't know enough about this. Please don't answer by asking chatgpt.
Isn't all/most their training data copyrighted anyways?We just have to say it's fair use, because it is useful to everyone. Maybe just require them to open their model.
Why they can train their model on copyrighted data and claiming fair use as they do not outright copy while disallowing training other models on their output? I understand revoking access though.
This exact question is currently being litigated, and using copyrighted material when training an LLM or AI model without permission could very well be illegal.
What do you mean? is this about AI training on copyrighted material?
Using text to train models is legally considered fair use. You can use copyrighted material, you just can't redistribute it. The output is considered the responsibility of the user of the software. It's up to the user to ensure code or other content they're using isn't licensed or copyrighted.
Could you explain why you think training models on copyrighted text is illegal or copyright infringement or whatever else it might be?
Probably not? Models are trained on restrictively licensed things all the time, such as images that are still in copyright. This is generally believed to be fair use, though I think this has not been tested in court?
Oh I see, so training on copyrighted content is fine unless it's your AI model...
Does this have any implications for training models of copyrighted material?