AI Training Data
This cluster centers on discussions about the sources, availability, and quality of training data for AI models and LLMs, with users frequently questioning what data was used and expressing curiosity or skepticism about its providence.
➡️ Stable 0.7x AI & Machine Learning
5,361
Comments
20
Years Active
5
Top Authors
#3658
Topic ID
Activity Over Time
2007 2
2008 3
2009 12
2010 12
2011 15
2012 19
2013 25
2014 33
2015 55
2016 145
2017 179
2018 177
2019 196
2020 225
2021 249
2022 407
2023 1,340
2024 991
2025 1,179
2026 99
Top Contributors
Keywords
AI NN LLM businessinsider.com LinkedIn GPT TV SetFit training data training data trained ai train llm answers data like data used
Sample Comments
I would love to see the providence of their training data.
AI? What was the training data? Would seem a bit thin on the ground.
Do you understand that data can be used for training?
All they're doing is giving it training data
Someone needs some training data for their model?
Open-source, eh? Where's the training data, then?
I wonder what they used as training data?
training is only one of many uses for the data.
The training data isn't available to the public.
The LLMs had to be trained on something.