Search code examples
phpregexpreg-matchpcre

PHP preg_match() PCRE logic issue?


Consider the following:

$lat = '89° 5'; // works
if(preg_match('/^(([0-8]\d|\d)°?(\s?([0-5]\d|\d))?)(N|S)?$/', $lat, $la)){
  $ck = 'DD° MM format --> ';
}
else{
  $test = 'invalid $lat format';
}
if(isset($ck)){
  $test = $ck.$la[0];
}
echo $test;

When $lat = '89°5' everything works fine too. What I'm trying to understand is why $lat = '89 5' fails? Maybe my brain isn't working, but it seems that last one should not be an invalid format because of °?. Thanks for helping me understand.


Solution

  • Use /(*UTF8)^(([0-8]\d|\d)°?(\s?([0-5]\d|\d))?)(N|S)?$/

    From http://www.pcre.org/pcre.txt:

    In order process UTF-8 strings, you must build PCRE's 8-bit library with UTF support, and, in addition, you must call pcre_compile() with the PCRE_UTF8 option flag, or the pattern must start with the sequence (*UTF8) or (*UTF). When either of these is the case, both the pattern and any subject strings that are matched against it are treated as UTF-8 strings instead of strings of individual 1-byte characters.

    So the PCRE engine was still seeing ° as two separate characters, and only making the second half optional.

    Note: Interestingly, I was able to get the expected results only using the (lowercase) u modifer on my install. http://php.net/manual/en/reference.pcre.pattern.modifiers.php.

    Note 2: My original comment had two options, don't use the other one as it breaks the test that currently works for you.