Similar to this question on a private forum: Why don’t RegEx implementations support intersection and complement?
Finite automatas built from regular expressions are closed under the set operations union, intersection, complement, and difference. These FA are closed under the two additional operations concatenation and Kleene star.
Anecdotally, concatenation, union, and star operations are ubiquitous in regular expression implementations. Why don't regular expression engines typically support the other set operations intersection, complement, and difference?
An example FA demonstrating the intersection of the two FA substring 01
and odd number of 1s
from these lecture notes.
Scott Aaronson. 6.045J Automata, Computability, and Complexity. Spring 2011. Massachusetts Institute of Technology: MIT OpenCourseWare, https://ocw.mit.edu. License: Creative Commons BY-NC-SA.
Regex engines that support lookarounds let you implement intersection, complement and difference :
(?=pattern1)pattern2
will match a string that both pattern1 and pattern2 match(?!pattern).*
will match anything that isn't matched by pattern (although more realistically you'd use pattern
as a regex and have your higher-level environment reverse the match result)(?!pattern1)pattern2
will match a string that is matched by pattern2 but not by pattern1However lookarounds are a rather recent feature in the history of regular expressions and still aren't supported by many regex engines. Why is that?
I'm not well versed in regex's history, but if I believe a cursory glance at Wikipedia articles, they originate from mathematician Stephen Cole Kleene's definition of regular languages which is only based on the union, concatenation and Kleene star operations, which might explain why those are the basic operations in regular expressions.