AI Training Copyright Debate

This cluster centers on debates about the legality of training AI and LLM models on copyrighted materials, including arguments over fair use, derivative works, ongoing lawsuits, and whether it constitutes infringement.

➡️ Stable 0.7x AI & Machine Learning
6,833
Comments
14
Years Active
5
Top Authors
#4202
Topic ID

Activity Over Time

2013
2
2014
1
2015
2
2016
19
2017
8
2018
17
2019
46
2020
65
2021
239
2022
421
2023
2,552
2024
1,381
2025
1,973
2026
109

Keywords

EULA e.g AI US LLM MP3 ML reuters.com GPL OSI copyright fair use copyrighted training model fair data ai trained models

Sample Comments

grantseltzer Dec 30, 2023 View on HN

Can someone with actual fundamental understanding of LLMs explain to me why they think it's perfectly legal to train models on copyrighted material? I don't know enough about this. Please don't answer by asking chatgpt.

huhuhu111 May 9, 2024 View on HN

Isn't all/most their training data copyrighted anyways?We just have to say it's fair use, because it is useful to everyone. Maybe just require them to open their model.

hawski Mar 15, 2023 View on HN

Why they can train their model on copyrighted data and claiming fair use as they do not outright copy while disallowing training other models on their output? I understand revoking access though.

yladiz Jan 28, 2024 View on HN

This exact question is currently being litigated, and using copyrighted material when training an LLM or AI model without permission could very well be illegal.

throwawayxcmz Aug 15, 2025 View on HN

What do you mean? is this about AI training on copyrighted material?

robbedpeter Jul 27, 2021 View on HN

Using text to train models is legally considered fair use. You can use copyrighted material, you just can't redistribute it. The output is considered the responsibility of the user of the software. It's up to the user to ensure code or other content they're using isn't licensed or copyrighted.

gradys Jul 29, 2021 View on HN

Could you explain why you think training models on copyrighted text is illegal or copyright infringement or whatever else it might be?

jefftk Oct 17, 2022 View on HN

Probably not? Models are trained on restrictively licensed things all the time, such as images that are still in copyright. This is generally believed to be fair use, though I think this has not been tested in court?

tapoxi Jan 29, 2025 View on HN

Oh I see, so training on copyrighted content is fine unless it's your AI model...

nologic01 Oct 13, 2023 View on HN

Does this have any implications for training models of copyrighted material?