Search code examples
phpweb-scrapinghtml-parsingtext-extraction

Extract numeric value from a strictly formatted string found in an HTML document


I have several strings that have been pulled using cURL from another website. The string itself contains the entire pages HTML structure, however inside each page there is a paragraph as outlined below:

<p>Displaying 1-15 of 15 items beginning with A</p>

or

<p>Displaying 1-20 of 33 items beginning with B</p>

What I need to do is just extract the total values from these strings (15 or 33 in the above case).

I'm not sure what the best method to extract the values is.


Solution

  • A brute force approach:

    http://php.net/manual/en/function.preg-match-all.php

    preg_match_all('/<p>Displaying (\d+)-(\d+) of (\d+) items beginning with ([A-Z]+)</p>/', $subject, $matches);