Large Directory Performance
The cluster discusses how filesystems slow down with directories containing many files, such as slow 'ls' operations even on powerful hardware, across Linux, Windows, and macOS. Users debate causes, optimizations like metadata caching, and comparisons to alternative approaches.
Activity Over Time
Top Contributors
Keywords
Sample Comments
When you have 100k+ files sometimes the filesystem itself matters. Have you set your expectations appropriately, aka compared it to a raw ls/dir ?
More accurate title: "Operations within a file are faster than directory operations"
In my experience, the standard linux file system can get very slow even on super powerful machines when you have too many files in a directory. I recently generated ~550,000 files in a directory on a 64-core machine with 256gb of RAM and an SSD, and it took around 10 seconds to do `ls` on it. So that could be a part of it too.
How about a fast filesystem? This is by far the slowest.
Well, one common issue is filesystems slowing down if data structures (e.g. directories) get big. Creating a single file doesn't take a constant number of disk operations.
hasn't the file system improved to the point where this is less of a problem?
Probably having to check if the actual disk entries changed is what slows it down. I wonder if it would be possible with nowadays' memory sizes to keep all metadata in memory as a write-through cache. Not sure if it'd be worth it though, my system has close to half a million files, but I'm only interested in about a hundred or so. I don't think file systems are slow in practice for typical human-scale operations though, with the exception of non-indexed "search all my fi
Doing at the block device level you're gonna have stuff stick around in cache, file-wise would blow things up faster
Confused what this has to do with calculating file sizes. Time spent computing file sizes is dwarfed by I/O, right?
Reading a bunch of small files from a volume is insanely slow from me in windows, so yeah...