Unix Text Processing Pipelines

Discussions revolve around using shell command-line tools like sort, uniq, cut, awk, and grep in pipelines for tasks such as deduplication, sorting, and frequency counting of words or lines, often comparing them to scripts in languages like Perl or Python.

📉 Falling 0.2x Programming Languages
1,440
Comments
19
Years Active
5
Top Authors
#4239
Topic ID

Activity Over Time

2007
4
2008
12
2009
13
2010
70
2011
57
2012
66
2013
126
2014
113
2015
68
2016
60
2017
72
2018
77
2019
146
2020
100
2021
114
2022
93
2023
120
2024
84
2025
45

Keywords

e.g uniq.html SED CLI p.s OK MonoRuns APL beef.txt BQN sort grep print awk words txt lines script count perl

Sample Comments

anon176 Jan 13, 2020 View on HN

cat | cut -d | sort | uniq or when in doubt, just write a few lines of perl.

rsclient Mar 28, 2021 View on HN

Sort line of text | filter to only print unique lines | print just the first word of each line | filter the print just the unique lines and the count | sort the counts numerically | print the top 10Given a bunch of lines of text, this reports on the frequency of first starting words.Probably the -C should be lower case?

x0x0 Mar 31, 2014 View on HN

depending on what you're doing, awk / sort / uniq / specialized scripts in ruby / sqlite / Rprovide more details

justinsaccount Jan 17, 2016 View on HN

a set replaces 'sort -u' or 'sort | uniq'. A dictionary replaces 'sort | uniq -c'

AnthonyMouse Apr 12, 2020 View on HN

Sure it can, but so can one python script. For that matter, so can the four or five pipes. What's so wrong with `grep regex myFile | sort -u | cut -d ',' -f 3` etc.?

tyingq Jun 5, 2023 View on HN

I tried a perl script versus the shell pipeline, see my other comments in the thread. It's significantly faster, because using sort and uniq -c is a pretty high effort way to count words, especially for big lists. Your words seem to work fine with it, it's just splitting on whitespace.

enriquto Aug 24, 2016 View on HN

The "namecount" example is rather silly though. It can be solved by a shell script much shorter than one line (sort|uniq -c).

mjburgess Nov 15, 2023 View on HN

My intuitions start with: cut, wc, sort, uniq

riffraff Oct 10, 2010 View on HN

isn't the "frequency" perl script the same as "sort | uniq -c" ?

yegle Jun 5, 2022 View on HN

I was wondering if there's a CLI that can replace sort | uniq -c | sort -nr. I feel like this is a very commonly used pattern.