Search code examples
.netregexcomputer-sciencetheory

Generate random sequences matching a given regular expression


Having an arbitrary regular expression how do i generate random text sequences matching (or not matching) a given regular expression? I'm interested in both theory and practical usage (need this in context of .NET to improve my tests) so links to articles or existing libraries would be appreciated.


Solution

  • From a purely theoretical standpoint, regular expressions are equivalent to rational languages, which are constructed on the following basis:

    • {} (a language with no words) is rational.
    • {a} (a language with a single one-word letter) is rational.
    • If L and M are two rational languages, then their union (the words either in L or in M) is rational.
    • If L and M are two rational languages, then their concatenation LM (words constructed by prepending a word from L to a word from M) is also rational.
    • If L is a rational language, then L* (words constructed by concatenating any number of words from language L) is also rational.

    This constructive approach complements the regular expression recognition/matching approach, and helps construct recursively words that match the expression:

    • make({}) = ""
    • make({a}) = "a"
    • make(A|B) = flip-coin ? make(A) : make(B)
    • make(AB) = make(A) + make(B)
    • make(A*) = flip-coin ? "" : make(A) + make(A*)