Search code examples
phpregexutc

plus-minus (±) sign in regex


I need to produce a regex pattern that verifies UTC offsets. These are typically formatted as UTC+05:30 or UTC-01:00. It seemed simple enough to match as follows (being permissive for spaces):

^UTC[ ]?[+\-±][ ]?[01][0-9]:[034][05]$

[Note: I updated this pattern based on feedback from @barman]

There is a pocket case in which the code is written UTC±00:00. However, the plus-minus sign is throwing things off. Using PHP for example:

echo preg_match("/^±$/","±");
echo preg_match("/^[±]$/","±");
echo preg_match("/^[\±]$/","±");

Will return true for the first match but false on the other two.

So my question is, does the ± require special handling in Regex? I can't find any reference to this symbol in the docs. Thx.


Solution

  • It looks like @Barmar probably solved the first issue you were having (matching the UTC string). However, to explain what you were seeing with:

    preg_match("/^±$/","±"); // true
    preg_match("/^[±]$/","±"); // false
    preg_match("/^[\±]$/","±"); // false
    

    The ± character is two bytes long, so preg_match is interpretting it as two characters. In order to match in the way you expect, you have to use the /u modifier. This tells preg_match to treat your pattern as utf-8, which will interpret ± as a single character instead of two characters.

    preg_match("/^[±]$/u","±"); // true
    

    And to include an example that matches your UTC sample:

    // with the /u modifier (works as expected)
    preg_match("/^UTC[ ]?[+\-±][ ]?[01][0-9]:[034][05]$/u", "UTC±05:30"); // true
    
    // without the /u modifier (does not match)
    preg_match("/^UTC[ ]?[+\-±][ ]?[01][0-9]:[034][05]$/", "UTC±05:30"); // false