Regex Backtracking vs Theory

Comments debate how modern regex engines like PCRE with backtracking and backreferences deviate from theoretical regular languages and finite automata, causing performance issues like ReDoS, contrasted with linear-time alternatives like RE2.

📉 Falling 0.2x Programming Languages
1,639
Comments
19
Years Active
5
Top Authors
#1433
Topic ID

Activity Over Time

2007
1
2008
10
2009
35
2010
51
2011
74
2012
51
2013
92
2014
50
2015
114
2016
141
2017
125
2018
95
2019
131
2020
113
2021
119
2022
154
2023
103
2024
123
2025
57

Keywords

e.g US RE2 mozilla.org UTS RE lang.org PCRE NP reddit.com regular expressions regular regex expressions perl expression languages engines matching matches

Sample Comments

slimsag May 27, 2021 View on HN

Regex engines don't just parse regular languages - that ship sailed a long time ago. As I understand it, any regex engine supporting backtracking is no longer working with regular languages.As the author of Perl's regex engine Larry Wall puts it[0]:> “Regular expressions” […] are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I’m not going to try to fight linguistic necessity here.

llimllib Nov 19, 2014 View on HN

Regexps with backtracks are not regular languages (IIRC, right?), does that matter here?

wahern Sep 17, 2019 View on HN

That's only for regular languages. Backreferences aren't regular.

WildUtah Dec 24, 2015 View on HN

These regexes aren't really regular expressions. They're PERL regexes with backreferences so a DFA won't do.

wyager Jun 21, 2015 View on HN

Perl-compatible regular expressions are not semantically equivalent to real regular expressions (as the article seems to claim). In fact, they correspond to completely different grammars in the Chomsky hierarchy. PCREs are Turing complete (and for closely related reasons, extremely slow for some expressions), while regular expressions are isomorphic to FSMs (which are always linear time in the length of their input).

michaelfairley Jul 20, 2010 View on HN

Regular expressions in modern programming languages and the regular expressions you learned about in your automata theory class are not the same thing. The back references present in the regular expressions discussed in the article make the entire system capable of handling even NP-complete problems. So yes, regular expressions (in the more common use of the term) can "parse".

burntsushi Nov 2, 2024 View on HN

No. I've written about why: https://blog.burntsushi.net/ripgrep/That blog is eight years old now, but still pretty accurate.For more details at the level of the regex engine, see: https://github.com/BurntSushi/rebar

eru May 27, 2021 View on HN

Proper regular expressions don't need backtracking.

woodruffw Dec 29, 2022 View on HN

This is not actionable general-purpose advice: many languages have regular expression APIs built around PCRE, which is much more of a "general advanced pattern matching" language than just pure regular expressions. Many of PCRE's features (like backreferences) are widely used and allow for parsing of non-regular inputs, meaning that you can't transform them into a linearly-evaluable regular expression.

MaxBarraclough Mar 13, 2020 View on HN

Regex with backreferences isn't really regex at all, in that it no longer corresponds to the regular languages.