AI Training Data

This cluster centers on discussions about the sources, availability, and quality of training data for AI models and LLMs, with users frequently questioning what data was used and expressing curiosity or skepticism about its providence.

➡️ Stable 0.7x AI & Machine Learning
5,361
Comments
20
Years Active
5
Top Authors
#3658
Topic ID

Activity Over Time

2007
2
2008
3
2009
12
2010
12
2011
15
2012
19
2013
25
2014
33
2015
55
2016
145
2017
179
2018
177
2019
196
2020
225
2021
249
2022
407
2023
1,340
2024
991
2025
1,179
2026
99

Keywords

AI NN LLM businessinsider.com LinkedIn GPT TV SetFit training data training data trained ai train llm answers data like data used

Sample Comments

iosjunkie May 21, 2024 View on HN

I would love to see the providence of their training data.

kayo_20211030 Feb 9, 2023 View on HN

AI? What was the training data? Would seem a bit thin on the ground.

killerstorm Jun 6, 2025 View on HN

Do you understand that data can be used for training?

Kakashi4 May 28, 2024 View on HN

All they're doing is giving it training data

1270018080 Jul 4, 2023 View on HN

Someone needs some training data for their model?

Meneth Sep 3, 2025 View on HN

Open-source, eh? Where's the training data, then?

wolframarnold Apr 13, 2016 View on HN

I wonder what they used as training data?

asadotzler Jun 14, 2024 View on HN

training is only one of many uses for the data.

grumbel May 1, 2022 View on HN

The training data isn't available to the public.

dartos Dec 25, 2023 View on HN

The LLMs had to be trained on something.