String Search Algorithms

Discussions focus on efficient algorithms and data structures for string matching and searching, including Boyer-Moore, Aho-Corasick, Levenshtein distance, tries, suffix trees, and related techniques suggested as alternatives or improvements.

📉 Falling 0.4x Other

3,406

Comments

Years Active

Top Authors

#4029

Topic ID

Activity Over Time

2007

2008

2009

2010

151

2011

174

2012

155

2013

153

2014

179

2015

256

2016

172

2017

190

2018

235

2019

257

2020

197

2021

245

2022

274

2023

245

2024

196

2025

202

2026

Top Contributors

burntsushi (44) MattPalmer1086 (23) rurban (21) amelius (17) kragen (15)

Keywords

e.g innerjoin.bit LSH NFKC en.m FM OG MemoTrie www.gnu github.com algorithm string search moore searching distance recursive aaa implementation nodes

Sample Comments

bluecalm • Aug 22, 2025 • View on HN

What about implementing text algorithms like prefix search or a suffix tree to mention the simplest ones? Don't you need a string length at various points there?

eboyjr • Aug 1, 2018 • View on HN

The Boyer–Moore string-search algorithm?

fleitz • Aug 30, 2013 • View on HN

4MB? Wouldn't boyer-moore, or a trie be very efficient for this?

nafey • Dec 4, 2020 • View on HN

Wouldn't using something like trie be useful here?

codesink • May 14, 2009 • View on HN

the boyer-moore algorithm should have been used. it would probably outperform any other given implementation for this kind of problems.

rogaos • Oct 31, 2019 • View on HN

Cool! Seems similar to the Aho-Corasick algorithm, which was designed for exactly this task (https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm)

deepanwadhwa • Mar 11, 2025 • View on HN

Is ahocorasick in a different plane than this?

ahilss • Nov 20, 2010 • View on HN

I recently used this algorithm for speeding up an iTunes style search function. Originally I did a strstr over every item in the database, but it wasn't quite fast enough. I precomputed a 32-bit mask - 1 bit per alphabetic char, 5 bits per digits, and the last bit for special characters - for every database item, and used that as an initial search filter. I only needed to use the more costly strstr on items that made it through this filter.

siculars • Jan 12, 2011 • View on HN

Grats donohoe, well done. What is of particular interest to me here is the use of the Levenshtein distance algorithm. The reason this works well here is because you are comparing your supplied key against a constrained set. Applying the Levenshtein distance algorithm (or its variants) against a constrained set of small size in this fashion has virtually no performance impact as the time to complete is entirely based on the size of the set you are matching against. On the other hand, matching aga

hawski • Dec 19, 2018 • View on HN

Pardon if I'm not well informed, that's why I ask. Is the problem similar to string search with multiple patterns? Could Rabin-Karp algorithm be used for searching? It also uses hashing. Or could this algorithm be used instead of Rabin-Karp?