Search code examples
phpregexpreg-replacepreg-matchsimple-html-dom

preg_match or preg_replace to get only number from html code


I am having a bit trouble to get only number from specific part of html code, i am parsing one page and output of content looks like this.

<div class="priceitem"> 1,098&nbsp;USD <span id="XUwt-price-mb-aE068a15dcca8E168a15dcca8-tooltipIcon" class="tooltip-icon afterPrice info-icon"> <svg class="" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 200 200" width="100%" height="100%"><use xlink:href="#common-icon-icon-info"></use></svg> </span> <br></div>

I am using simplehtmldom to get content, so everything inside priceitem get output with it. Can i somehow use preg_match to match pattern or preg_replace to get only price number like 1,098.

The price can change so sometimes it will be only 29 usd which will output 29&nbsp;USD, sometimes price can be 305&nbsp;USD, but over 1k it will have comma which i don't need really.

Here is my attempt on everything:

foreach($html->find('div.priceitem') as $element) {
    $pricenum = preg_match("/([^\s]+)/","", $element->innertext);
    echo $pricenum;
}

Solution

  • Here's a pattern that should get you all possible prices:

    (\d{1,3}(?:,\d{1,3})*)+(?=&nbsp;USD)
    

    The idea is, the numbers are in blocks of 1-3 digits, groups with a leading comma are allowed but not required after a regular block. &nbsp;USD is as an anchor.

    Online sample

    However, if you are only interested in the integer part, removing the comma is still the best option: str_replace(',', '' , $string);