Search code examples
regexaem

Regex expression format is different in AEM dispatcher


When we create the regex which has a forward slash, then we need to put a backslash before forward-slash since the forward slash is the unescaped delimiter. For example, if I want my regex to match /content/att, then I need to put regex like this /content\/att. And this works too.

But when we add the dispatcher rule in AEM to allow a url path, the backslash is not needed for the unescaped delimiter. I would appreciate is someone can help me understand this, I mean why we need the backslash when we write the regex, but not when using the same regex in the url path of the dispatcher rule.

In dispatcher, look at the url path – there is no backslash before /att

/type "allow"
/url "/content/att"
/extension '(gif)'
}

Solution

  • Short answer: because these are two different types of regex representations.

    Long answer:

    Historically, regexes have first appeared in text edtors like QED and ed. There, regexes were used for string substitutions (search and replace). The tools needed some way to distinguish the search regex from the replacement string, which is why we got the delimiter. A command to replace some text in ed, for example, would be s«DELIMITER»search-regex«DELIMITER»substitution-string«DELIMITER»flags.

    Most single-char delimiters would work but / was often chosen. Of course, it was possible to use the delimiter as part of the regex or the substitution, in which case it would have to be escaped using backslash.

    Some programming languages have codified / as the de-facto standard delimiter for regex literals. JavaScript is an example for this.

    Now, usages that have no need for a regex to be separated from the substitution (or allow for regex flags) don’t use delimiters at all. Such is the case in Java, where there are no regex literals, regexes are always created from a string using the Pattern class. Which is why, in AEM you don’t need to escape the /.

    You didn’t show us your apache dispatcher config file so I’m not sure where you’re escaping the slash there. I know that apache’s mod_rewrite also doesn’t use delimited regexes.