Regex Backtracking vs Theory
Comments debate how modern regex engines like PCRE with backtracking and backreferences deviate from theoretical regular languages and finite automata, causing performance issues like ReDoS, contrasted with linear-time alternatives like RE2.
Activity Over Time
Top Contributors
Keywords
Sample Comments
Regex engines don't just parse regular languages - that ship sailed a long time ago. As I understand it, any regex engine supporting backtracking is no longer working with regular languages.As the author of Perl's regex engine Larry Wall puts it[0]:> “Regular expressions” […] are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I’m not going to try to fight linguistic necessity here.
Regexps with backtracks are not regular languages (IIRC, right?), does that matter here?
That's only for regular languages. Backreferences aren't regular.
These regexes aren't really regular expressions. They're PERL regexes with backreferences so a DFA won't do.
Perl-compatible regular expressions are not semantically equivalent to real regular expressions (as the article seems to claim). In fact, they correspond to completely different grammars in the Chomsky hierarchy. PCREs are Turing complete (and for closely related reasons, extremely slow for some expressions), while regular expressions are isomorphic to FSMs (which are always linear time in the length of their input).
Regular expressions in modern programming languages and the regular expressions you learned about in your automata theory class are not the same thing. The back references present in the regular expressions discussed in the article make the entire system capable of handling even NP-complete problems. So yes, regular expressions (in the more common use of the term) can "parse".
No. I've written about why: https://blog.burntsushi.net/ripgrep/That blog is eight years old now, but still pretty accurate.For more details at the level of the regex engine, see: https://github.com/BurntSushi/rebar
Proper regular expressions don't need backtracking.
This is not actionable general-purpose advice: many languages have regular expression APIs built around PCRE, which is much more of a "general advanced pattern matching" language than just pure regular expressions. Many of PCRE's features (like backreferences) are widely used and allow for parsing of non-regular inputs, meaning that you can't transform them into a linearly-evaluable regular expression.
Regex with backreferences isn't really regex at all, in that it no longer corresponds to the regular languages.