AI Training Copyright Debate

This cluster centers on debates about the legality of training AI and LLM models on copyrighted materials, including arguments over fair use, derivative works, ongoing lawsuits, and whether it constitutes infringement.

➡️ Stable 0.7x AI & Machine Learning

6,833

Comments

Years Active

Top Authors

#4202

Topic ID

Activity Over Time

2013

2014

2015

2016

2017

2018

2019

2020

2021

239

2022

421

2023

2,552

2024

1,381

2025

1,973

2026

109

Top Contributors

dragonwriter (80) visarga (58) chii (44) kmeisthax (39) ben_w (33)

Keywords

EULA e.g AI US LLM MP3 ML reuters.com GPL OSI copyright fair use copyrighted training model fair data ai trained models

Sample Comments

grantseltzer • Dec 30, 2023 • View on HN

Can someone with actual fundamental understanding of LLMs explain to me why they think it's perfectly legal to train models on copyrighted material? I don't know enough about this. Please don't answer by asking chatgpt.

huhuhu111 • May 9, 2024 • View on HN

Isn't all/most their training data copyrighted anyways?We just have to say it's fair use, because it is useful to everyone. Maybe just require them to open their model.

hawski • Mar 15, 2023 • View on HN

Why they can train their model on copyrighted data and claiming fair use as they do not outright copy while disallowing training other models on their output? I understand revoking access though.

yladiz • Jan 28, 2024 • View on HN

This exact question is currently being litigated, and using copyrighted material when training an LLM or AI model without permission could very well be illegal.

throwawayxcmz • Aug 15, 2025 • View on HN

What do you mean? is this about AI training on copyrighted material?

robbedpeter • Jul 27, 2021 • View on HN

Using text to train models is legally considered fair use. You can use copyrighted material, you just can't redistribute it. The output is considered the responsibility of the user of the software. It's up to the user to ensure code or other content they're using isn't licensed or copyrighted.

gradys • Jul 29, 2021 • View on HN

Could you explain why you think training models on copyrighted text is illegal or copyright infringement or whatever else it might be?

jefftk • Oct 17, 2022 • View on HN

Probably not? Models are trained on restrictively licensed things all the time, such as images that are still in copyright. This is generally believed to be fair use, though I think this has not been tested in court?

tapoxi • Jan 29, 2025 • View on HN

Oh I see, so training on copyrighted content is fine unless it's your AI model...

nologic01 • Oct 13, 2023 • View on HN

Does this have any implications for training models of copyrighted material?