AI Training Copyright Debate
The cluster discusses the ethics, legality, and hypocrisy of AI companies like OpenAI using copyrighted or unlicensed data (e.g., scraped web content, pirated ebooks) to train models, while prohibiting others from using their outputs for training under fair use claims.
Activity Over Time
Top Contributors
Keywords
Sample Comments
Sorry, it's now a problem to train off other people's data? Surely openai has never trained off other people's data without permission...
So OpenAI license the content they train on... They just admitted it has value.
Wouldn't it be still legal to train on the data due to fair use?
Why not? Open AI used data that they didn't receive permission from the author to train their models.
It was trained on data they don't own. They could face a lawsuit for this, like it has happened for image generation models.
Can the company just claim it’s for AI training and it’s fair use?
It’s not so innocent,https://stackdiary.com/brave-selling-copyrighted-data-for-ai...
That only works when ai training isnt considered fair use.
Don't they have an explicit T&C that says you are not allowed to use their output for training other models?
Probably that data was used to train AI models too. I hope we establish a legal framework that prevents training models without proper permission, and the companies that have already trained their models will get fined and those models will be banned from commercial use.I enjoy the rapid progress of LLMs. ChatGPT and Claude are already a critical part of my daily work. But I don't like the current situation where VCs and start-ups use unpermitted data to train the models, don't resp