Search code examples
javamysqlregexposixmysql-5.7

Verify if a regex is Posix compatible


I would like to know if there is a way to verify if a regular expression is Posix compatible using Java.

I'm using MySQL 5.7 version and I can't use "normal" regular expressions with the REGEXP function:

MySQL uses Henry Spencer's implementation of regular expressions, which is aimed at conformance with POSIX 1003.2. MySQL uses the extended version to support regular expression pattern-matching operations in SQL statements.

If I tried to use some of this tokens, for example:

  • \w
  • \d
  • (?:

They are considered invalid or just ignored by the MySQL. Probably there are another ones.

I'm aware the Java Pattern class can be used to verify if a regex is valid using:

Pattern.compile(regex);

Returning an exception if the regex is invalid. However, as I said, I'm trying to validate if the regex is only Posix compatible, so I could validate the regex input before save the information on database.


Solution

  • Syntax like \w, \d, (?:) is supported in Perl-compatible regular expressions (PCRE), not in POSIX. Tools like egrep support enhanced features for compatibility, but that doesn't make them POSIX.

    From the man page for re_format(7):

    ENHANCED FEATURES

    When the REG_ENHANCED flag is passed to one of the regcomp() variants, additional features are activated. Like the enhanced regex implementations in scripting languages such as perl(1) and python(1), these additional features may conflict with the IEEE Std 1003.2 (``POSIX.2'') standards in some ways. Use this with care in situations which require portability (including to past versions of the Mac OS X using the previous regex implementation).

    There's a distinction between "extended" and "enhanced." Extended refers to levels of POSIX regular expression features. Enhanced refers to the syntax that is supported by PCRE but not by POSIX.

    You can do many of the things you want in POSIX syntax:

    • For \w, use [[:alnum:]_].

    • For \d, use [[:digit:]].

    • The (?:) syntax is unnecessary, because MySQL REGEXP doesn't support capturing groups anyway. You can simply use () for grouping.

    I don't think it's necessary to use a Java validator to parse your regular expressions. You should be able to read the documentation and use only features that appear in that doc.

    I mean, really, regular expressions syntax is not that complex. You could create a quick-reference sheet on a Post-It note.