ML Training Data Limits

Comments discuss the scarcity, quality, and size limitations of training data as the primary bottleneck for machine learning models, often more critical than compute power.

📉 Falling 0.4x AI & Machine Learning
3,114
Comments
20
Years Active
5
Top Authors
#9101
Topic ID

Activity Over Time

2007
3
2008
3
2009
5
2010
8
2011
21
2012
33
2013
18
2014
47
2015
70
2016
217
2017
323
2018
257
2019
236
2020
254
2021
267
2022
285
2023
436
2024
335
2025
284
2026
12

Keywords

e.g II ADA GB ROI MLP ImageNet AI FWIW OVH training training data dataset data images training set datasets recognition learning ml

Sample Comments

croes Aug 16, 2023 View on HN

Because you need more training data for better results and they are running out of new training data.

Hydraulix989 Dec 22, 2016 View on HN

You can't possibly get enough training data for this.

danuker Apr 2, 2022 View on HN

I'd imagine training data would be the limiting factor.

snowstormsun Dec 27, 2023 View on HN

Because they wouldn't have enough good quality training data then probably.

MuffinFlavored Dec 1, 2022 View on HN

I could be wrong but I think part of the issue is this needs some large files for the trained dataset?

tracer4201 Dec 5, 2018 View on HN

Disclaimer: Not an ML expertI suspect that's a function of the size and quality of their training set?

jstx1 Apr 17, 2023 View on HN

Smaller % of training data doesn't necessarily mean lower quality.

drawnwren May 13, 2024 View on HN

Computer power is not stagnating, but the availability of training data is. It's not like there's a second stackoverflow or reddit to scrape.

amelius Apr 17, 2020 View on HN

The problem is the amount of pictures you'd need. It's much easier to use available datasets if you know how to preprocess the data.

paganel Apr 28, 2017 View on HN

Where "enough training examples" has proven to be the real difficult problem.