I need to produce a regex pattern that verifies UTC offsets. These are typically formatted as UTC+05:30
or UTC-01:00
. It seemed simple enough to match as follows (being permissive for spaces):
^UTC[ ]?[+\-±][ ]?[01][0-9]:[034][05]$
[Note: I updated this pattern based on feedback from @barman]
There is a pocket case in which the code is written UTC±00:00
. However, the plus-minus sign is throwing things off. Using PHP for example:
echo preg_match("/^±$/","±");
echo preg_match("/^[±]$/","±");
echo preg_match("/^[\±]$/","±");
Will return true
for the first match but false
on the other two.
So my question is, does the ±
require special handling in Regex? I can't find any reference to this symbol in the docs. Thx.
It looks like @Barmar probably solved the first issue you were having (matching the UTC string). However, to explain what you were seeing with:
preg_match("/^±$/","±"); // true
preg_match("/^[±]$/","±"); // false
preg_match("/^[\±]$/","±"); // false
The ±
character is two bytes long, so preg_match
is interpretting it as two characters. In order to match in the way you expect, you have to use the /u modifier. This tells preg_match
to treat your pattern as utf-8, which will interpret ±
as a single character instead of two characters.
preg_match("/^[±]$/u","±"); // true
And to include an example that matches your UTC sample:
// with the /u modifier (works as expected)
preg_match("/^UTC[ ]?[+\-±][ ]?[01][0-9]:[034][05]$/u", "UTC±05:30"); // true
// without the /u modifier (does not match)
preg_match("/^UTC[ ]?[+\-±][ ]?[01][0-9]:[034][05]$/", "UTC±05:30"); // false