String Search Algorithms
Discussions focus on efficient algorithms and data structures for string matching and searching, including Boyer-Moore, Aho-Corasick, Levenshtein distance, tries, suffix trees, and related techniques suggested as alternatives or improvements.
Activity Over Time
Top Contributors
Keywords
Sample Comments
What about implementing text algorithms like prefix search or a suffix tree to mention the simplest ones? Don't you need a string length at various points there?
The Boyer–Moore string-search algorithm?
4MB? Wouldn't boyer-moore, or a trie be very efficient for this?
Wouldn't using something like trie be useful here?
the boyer-moore algorithm should have been used. it would probably outperform any other given implementation for this kind of problems.
Cool! Seems similar to the Aho-Corasick algorithm, which was designed for exactly this task (https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm)
Is ahocorasick in a different plane than this?
I recently used this algorithm for speeding up an iTunes style search function. Originally I did a strstr over every item in the database, but it wasn't quite fast enough. I precomputed a 32-bit mask - 1 bit per alphabetic char, 5 bits per digits, and the last bit for special characters - for every database item, and used that as an initial search filter. I only needed to use the more costly strstr on items that made it through this filter.
Grats donohoe, well done. What is of particular interest to me here is the use of the Levenshtein distance algorithm. The reason this works well here is because you are comparing your supplied key against a constrained set. Applying the Levenshtein distance algorithm (or its variants) against a constrained set of small size in this fashion has virtually no performance impact as the time to complete is entirely based on the size of the set you are matching against. On the other hand, matching aga
Pardon if I'm not well informed, that's why I ask. Is the problem similar to string search with multiple patterns? Could Rabin-Karp algorithm be used for searching? It also uses hashing. Or could this algorithm be used instead of Rabin-Karp?