Unix Text Processing Pipelines

Discussions revolve around using shell command-line tools like sort, uniq, cut, awk, and grep in pipelines for tasks such as deduplication, sorting, and frequency counting of words or lines, often comparing them to scripts in languages like Perl or Python.

📉 Falling 0.2x Programming Languages

1,440

Comments

Years Active

Top Authors

#4239

Topic ID

Activity Over Time

2007

2008

2009

2010

2011

2012

2013

126

2014

113

2015

2016

2017

2018

2019

146

2020

100

2021

114

2022

2023

120

2024

2025

Top Contributors

asicsp (14) kazinator (14) justinsaccount (11) kragen (9) dredmorbius (9)

Keywords

e.g uniq.html SED CLI p.s OK MonoRuns APL beef.txt BQN sort grep print awk words txt lines script count perl

Sample Comments

anon176 • Jan 13, 2020 • View on HN

cat | cut -d | sort | uniq or when in doubt, just write a few lines of perl.

rsclient • Mar 28, 2021 • View on HN

Sort line of text | filter to only print unique lines | print just the first word of each line | filter the print just the unique lines and the count | sort the counts numerically | print the top 10Given a bunch of lines of text, this reports on the frequency of first starting words.Probably the -C should be lower case?

x0x0 • Mar 31, 2014 • View on HN

depending on what you're doing, awk / sort / uniq / specialized scripts in ruby / sqlite / Rprovide more details

justinsaccount • Jan 17, 2016 • View on HN

a set replaces 'sort -u' or 'sort | uniq'. A dictionary replaces 'sort | uniq -c'

AnthonyMouse • Apr 12, 2020 • View on HN

Sure it can, but so can one python script. For that matter, so can the four or five pipes. What's so wrong with `grep regex myFile | sort -u | cut -d ',' -f 3` etc.?

tyingq • Jun 5, 2023 • View on HN

I tried a perl script versus the shell pipeline, see my other comments in the thread. It's significantly faster, because using sort and uniq -c is a pretty high effort way to count words, especially for big lists. Your words seem to work fine with it, it's just splitting on whitespace.

enriquto • Aug 24, 2016 • View on HN

The "namecount" example is rather silly though. It can be solved by a shell script much shorter than one line (sort|uniq -c).

mjburgess • Nov 15, 2023 • View on HN

My intuitions start with: cut, wc, sort, uniq

riffraff • Oct 10, 2010 • View on HN

isn't the "frequency" perl script the same as "sort | uniq -c" ?

yegle • Jun 5, 2022 • View on HN

I was wondering if there's a CLI that can replace sort | uniq -c | sort -nr. I feel like this is a very commonly used pattern.