String Search Algorithms

Discussions focus on efficient algorithms and data structures for string matching and searching, including Boyer-Moore, Aho-Corasick, Levenshtein distance, tries, suffix trees, and related techniques suggested as alternatives or improvements.

📉 Falling 0.4x Other
3,406
Comments
20
Years Active
5
Top Authors
#4029
Topic ID

Activity Over Time

2007
8
2008
30
2009
65
2010
151
2011
174
2012
155
2013
153
2014
179
2015
256
2016
172
2017
190
2018
235
2019
257
2020
197
2021
245
2022
274
2023
245
2024
196
2025
202
2026
22

Keywords

e.g innerjoin.bit LSH NFKC en.m FM OG MemoTrie www.gnu github.com algorithm string search moore searching distance recursive aaa implementation nodes

Sample Comments

bluecalm Aug 22, 2025 View on HN

What about implementing text algorithms like prefix search or a suffix tree to mention the simplest ones? Don't you need a string length at various points there?

eboyjr Aug 1, 2018 View on HN

The Boyer–Moore string-search algorithm?

fleitz Aug 30, 2013 View on HN

4MB? Wouldn't boyer-moore, or a trie be very efficient for this?

nafey Dec 4, 2020 View on HN

Wouldn't using something like trie be useful here?

codesink May 14, 2009 View on HN

the boyer-moore algorithm should have been used. it would probably outperform any other given implementation for this kind of problems.

rogaos Oct 31, 2019 View on HN

Cool! Seems similar to the Aho-Corasick algorithm, which was designed for exactly this task (https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm)

deepanwadhwa Mar 11, 2025 View on HN

Is ahocorasick in a different plane than this?

ahilss Nov 20, 2010 View on HN

I recently used this algorithm for speeding up an iTunes style search function. Originally I did a strstr over every item in the database, but it wasn't quite fast enough. I precomputed a 32-bit mask - 1 bit per alphabetic char, 5 bits per digits, and the last bit for special characters - for every database item, and used that as an initial search filter. I only needed to use the more costly strstr on items that made it through this filter.

siculars Jan 12, 2011 View on HN

Grats donohoe, well done. What is of particular interest to me here is the use of the Levenshtein distance algorithm. The reason this works well here is because you are comparing your supplied key against a constrained set. Applying the Levenshtein distance algorithm (or its variants) against a constrained set of small size in this fashion has virtually no performance impact as the time to complete is entirely based on the size of the set you are matching against. On the other hand, matching aga

hawski Dec 19, 2018 View on HN

Pardon if I'm not well informed, that's why I ask. Is the problem similar to string search with multiple patterns? Could Rabin-Karp algorithm be used for searching? It also uses hashing. Or could this algorithm be used instead of Rabin-Karp?