Search code examples
phpregexmodxmodx-revolution

Regex to deny special norwegian letters in friendly url - modx


I'm developing a page using modx revolution. It's a complete cms with a lot of built in functions. If I create a page in the manager it will automatically produce a friendly url for me pointing to that page.

The problem is that is does not deny the special characters we have in Norway, æøå (and uppercase ÆØÅ).

The system got a built in regex-pattern to strip the url for most bad characters, but I need the experession to strip æøå and ÆØÅ too.

The pattern looks like this:

/[\0\x0B\t\n\r\f\a&=+%#<>"~:`@\?\[\]\{\}\|\^'\\]/

Can anyone use their magic regex-knowledge to include these 6 letters? I am totally green at regex, and simply adding the letters in there did not seem to work.

PS: Please don't use the common "boo, don't use regex for this" here. The pattern is there for a reason, and i don't want to mess around with the core if we have to upgrade modx (which is pretty likely to happen sooner or later).


Solution

  • Try to use Unicode. I don't know modx, but since its written in php, I hope it uses php preg regular expressions.

    /[\0\x0B\t\n\r\f\a&=+%#<>"~:`@\?\[\]\{\}\|\^'\\\x{00C6}\x{00E6}\x{00C5}\x{00E5}\x{00D8}\x{00F8}]/u
    

    The u modifier tells php to use unicode matching mode, it then interprets the regular expression as unicode string.

    \x{00C6} is the Unicode character Æ

    Please check the code of the other characters by yourself to ensure I didn't made a mistake while looking them up.

    See regular-expression.info for the unicode usage in php

    Unicode.org for the code point