Large Directory Performance

The cluster discusses how filesystems slow down with directories containing many files, such as slow 'ls' operations even on powerful hardware, across Linux, Windows, and macOS. Users debate causes, optimizations like metadata caching, and comparisons to alternative approaches.

📉 Falling 0.4x DevOps & Infrastructure
3,484
Comments
20
Years Active
5
Top Authors
#6963
Topic ID

Activity Over Time

2007
2
2008
10
2009
51
2010
57
2011
133
2012
110
2013
119
2014
134
2015
129
2016
182
2017
226
2018
183
2019
213
2020
258
2021
383
2022
331
2023
346
2024
338
2025
264
2026
15

Keywords

RAM e.g REPL SSD foo.l OK toronto.edu ftp.cs UTF8 foo.c file files filesystem foo directory disk filesystems slow sizes faster

Sample Comments

When you have 100k+ files sometimes the filesystem itself matters. Have you set your expectations appropriately, aka compared it to a raw ls/dir ?

mpweiher Jan 15, 2023 View on HN

More accurate title: "Operations within a file are faster than directory operations"

eigenvalue Aug 28, 2023 View on HN

In my experience, the standard linux file system can get very slow even on super powerful machines when you have too many files in a directory. I recently generated ~550,000 files in a directory on a 64-core machine with 256gb of RAM and an SSD, and it took around 10 seconds to do `ls` on it. So that could be a part of it too.

rurban Feb 12, 2022 View on HN

How about a fast filesystem? This is by far the slowest.

cperciva Feb 7, 2011 View on HN

Well, one common issue is filesystems slowing down if data structures (e.g. directories) get big. Creating a single file doesn't take a constant number of disk operations.

PretzelFisch Feb 12, 2017 View on HN

hasn't the file system improved to the point where this is less of a problem?

wvh Nov 21, 2021 View on HN

Probably having to check if the actual disk entries changed is what slows it down. I wonder if it would be possible with nowadays' memory sizes to keep all metadata in memory as a write-through cache. Not sure if it'd be worth it though, my system has close to half a million files, but I'm only interested in about a hundred or so. I don't think file systems are slow in practice for typical human-scale operations though, with the exception of non-indexed "search all my fi

nightfly Sep 9, 2023 View on HN

Doing at the block device level you're gonna have stuff stick around in cache, file-wise would blow things up faster

dataflow Jun 17, 2021 View on HN

Confused what this has to do with calculating file sizes. Time spent computing file sizes is dwarfed by I/O, right?

NightlyDev Apr 6, 2018 View on HN

Reading a bunch of small files from a volume is insanely slow from me in windows, so yeah...