Large Directory Performance

The cluster discusses how filesystems slow down with directories containing many files, such as slow 'ls' operations even on powerful hardware, across Linux, Windows, and macOS. Users debate causes, optimizations like metadata caching, and comparisons to alternative approaches.

📉 Falling 0.4x DevOps & Infrastructure

3,484

Comments

Years Active

Top Authors

#6963

Topic ID

Activity Over Time

2007

2008

2009

2010

2011

133

2012

110

2013

119

2014

134

2015

129

2016

182

2017

226

2018

183

2019

213

2020

258

2021

383

2022

331

2023

346

2024

338

2025

264

2026

Top Contributors

tyingq (18) loeg (16) vidarh (16) the8472 (15) dekhn (14)

Keywords

RAM e.g REPL SSD foo.l OK toronto.edu ftp.cs UTF8 foo.c file files filesystem foo directory disk filesystems slow sizes faster

Sample Comments

worthless-trash • May 1, 2025 • View on HN

When you have 100k+ files sometimes the filesystem itself matters. Have you set your expectations appropriately, aka compared it to a raw ls/dir ?

mpweiher • Jan 15, 2023 • View on HN

More accurate title: "Operations within a file are faster than directory operations"

eigenvalue • Aug 28, 2023 • View on HN

In my experience, the standard linux file system can get very slow even on super powerful machines when you have too many files in a directory. I recently generated ~550,000 files in a directory on a 64-core machine with 256gb of RAM and an SSD, and it took around 10 seconds to do `ls` on it. So that could be a part of it too.

rurban • Feb 12, 2022 • View on HN

How about a fast filesystem? This is by far the slowest.

cperciva • Feb 7, 2011 • View on HN

Well, one common issue is filesystems slowing down if data structures (e.g. directories) get big. Creating a single file doesn't take a constant number of disk operations.

PretzelFisch • Feb 12, 2017 • View on HN

hasn't the file system improved to the point where this is less of a problem?

wvh • Nov 21, 2021 • View on HN

Probably having to check if the actual disk entries changed is what slows it down. I wonder if it would be possible with nowadays' memory sizes to keep all metadata in memory as a write-through cache. Not sure if it'd be worth it though, my system has close to half a million files, but I'm only interested in about a hundred or so. I don't think file systems are slow in practice for typical human-scale operations though, with the exception of non-indexed "search all my fi

nightfly • Sep 9, 2023 • View on HN

Doing at the block device level you're gonna have stuff stick around in cache, file-wise would blow things up faster

dataflow • Jun 17, 2021 • View on HN

Confused what this has to do with calculating file sizes. Time spent computing file sizes is dwarfed by I/O, right?

NightlyDev • Apr 6, 2018 • View on HN

Reading a bunch of small files from a volume is insanely slow from me in windows, so yeah...