Search code examples
regexgoglob

Regex to Glob and vice-versa conversion


We have a requirement where we want to convert Regex to cloudfront supported Glob and vice-versa. Any suggestion how can we achieve that and first of all whether it's possible?especially from Regex to Glob, as I understand regex is kind of superset so it might not be possible to convert all the Regex to corresponding Glob?


Solution

  • To convert from a glob you would need to write a parser that split the pattern into an abstract syntax tree. For example, the glob *-{[0-9],draft}.docx might parse to [Anything(), "-", OneOf([Range("0", "9"), "draft"]), ".docx"].

    Then you would walk the AST and output the equivalent regular expression for each node. For example, the rules you might use for this could be:

    Anything()  -> .*
    Range(x, y) -> [x-y]
    OneOf(x, y) -> (x|y)
    

    resulting in the regular expression .*-([0-9]|draft).docx.

    That's not perfect, because you also have to remember to escape any special characters; . is a special character in regular expressions, so you should escape it, yielding finally .*-([0-9]|draft)\.docx.

    Strictly speaking regular expression cannot all be translated to glob patterns. The Kleene star operation does not exist in globbing; the simple regular expression a* (i.e., any number of a characters) cannot be translated to a glob pattern.

    I'm not sure what types of globs CloudFront supports (the documentation returned no hits for the term "glob"), but here is some documentation on commonly-supported shell glob pattern wildcards.

    Here is a summarization of the some equivalent sequences:

    Glob Wildcard Regular Expression Meaning
    ? . Any single character
    * .* Zero or more characters
    [a-z] [a-z] Any character from the range
    [!a-m] [^a-m] A character not in the range
    [a,b,c] [abc] One of the given characters
    {cat,dog,bat} (cat|dog|bat) One of the given options
    {*.tar,*.gz} (.*\.tar|.*\.gz) One of the given options, considering nested wildcards