Search code examples
phpregexeuro

PHP regex to find unencoded euro symbol


I'm in PHP. I'd like to find numbers in a sentence that start with a currency symbol, and returns the number. To search "I spent €100 on shoes" and return "100".

I've got this working for $ and £:

'/[$£]([0-9.]{1,})/'

But adding the € euro symbol doesn't work. (The sentences come from parsed emails, so I don't need to find €);

preg_match_all('/[€]([0-9.]{1,})/', $sentence, $match);

I've found the following on SO: regex for currency (euro) But it doesn't encode the euro symbol.

To encode the euro symbol, I've tried:

/[\x{20ac}]([0-9.]{1,})/u
"[^-a-zA-Z0-9.:,!+£$ \\ ". chr(164) ."]"

But can't figure it out. Any help?


Solution

  • When I put this in:

     echo preg_match("#€[0-9]{1,}#", "€1" )?1:0;
    

    I get 1, so you might not need unicode. But if you would like to use UTF-8 nevertheless, I found this as a comment under the PHP docs.

    function unichr($u) {
        return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');
    }
    

    To get the €, you call unichr(8364). Use that in place of the euro sign above and you'll be good. (I feel I should note: that I tested both as the unicode version:

    preg_match("#".unichr(8364)."\s*([0-9]{1,})#u", unichr(8364). "1" )?1:0;
    

    You might want to do str_replace('€', unichr(8364), $str); first...

    PS. You probably also want to allow for spaces and decimals: #€\s*([0-9]{1,}(\.?[0-9]{2}))#