Big Data Size Debates

Discussions revolve around defining thresholds for small, medium, large, and huge datasets, debating data volumes like 500MB to TBs in databases and analytics, and when data fits in RAM versus requiring distributed systems.

📉 Falling 0.4x Databases
4,327
Comments
20
Years Active
5
Top Authors
#1917
Topic ID

Activity Over Time

2007
11
2008
28
2009
88
2010
101
2011
95
2012
172
2013
232
2014
169
2015
259
2016
325
2017
314
2018
239
2019
259
2020
301
2021
327
2022
339
2023
374
2024
378
2025
289
2026
27

Keywords

RAM GB DB DBA DOS XML ELT SQL AWS i.e data ram datasets rows nodes size dataset fit large db

Sample Comments

nikita Mar 20, 2018 View on HN

Can you share the whole use case as well as data size?

morcus Sep 15, 2025 View on HN

Surely most queries should process much less than 1 TB of data?

outside1234 Oct 31, 2019 View on HN

I see you haven't worked with a truly titantic amount of data then :)

Xorlev Nov 12, 2014 View on HN

You probably aren't the target audience if your database fits in RAM.

vicaya May 22, 2009 View on HN

Sorry, but 500MB DB size is a tiny dataset these days (anything < 1GB is tiny, < 4GB is small, < RAM on a single node (~8GB-64GB) is medium, < Disks on a single node (~128GB to a few TB) is large, huge dataset requires multiple nodes and typically above 128TBs.)

delive Nov 8, 2024 View on HN

What would you consider to be small or medium? I have a use case for analytics on ~1 billion rows that are about 1TB in postgres. Have you tried on that volume?

Maybe they are dealing with a few gig of CSV time series?

subleq Aug 11, 2017 View on HN

I don't have any experience with this type of thing, so that sounds like an incredibly large amount of data. What are you doing that requires it? What type of useful queries are you even able to perform over 432 billion records?

dvirsky Apr 18, 2014 View on HN

9M rows? Luxury! Big data starts at 100M rows! /s

kcorbitt Nov 22, 2013 View on HN

Data too large to fit in memory; anything that you're going to want to query relationally (as opposed to just key/value) in the future.