Search code examples
phpregexpreg-match

PHP regex preg_match numbers before a multiword string


I am trying to extract the number 203 from this sample.

Here is the sample I am running the regex against:

<span class="crAvgStars" style="white-space:no-wrap;"><span class="asinReviewsSummary" name="B00KFQ04CI" ref="cm_cr_if_acr_cm_cr_acr_pop_" getargs="{&quot;tag&quot;:&quot;&quot;,&quot;linkCode&quot;:&quot;sp1&quot;}">

<a href="https://www.amazon.com/Moto-1st-Gen-Screen-Protector/product-reviews/B00KFQ04CI/ref=cm_cr_if_acr_cm_cr_acr_img/181-2284807-1957201?ie=UTF8&linkCode=sp1&showViewpoints=1" target="_top"><img src="https://images-na.ssl-images-amazon.com/images/G/01/x-locale/common/customer-reviews/ratings/stars-4-5._CB192238104_.gif" width="55" alt="4.3 out of 5 stars" align="absbottom" title="4.3 out of 5 stars" height="12" border="0" /></a>&nbsp;</span>(<a href="https://www.amazon.com/Moto-1st-Gen-Screen-Protector/product-reviews/B00KFQ04CI/ref=cm_cr_if_acr_cm_cr_acr_txt/181-2284807-1957201?ie=UTF8&linkCode=sp1&showViewpoints" target="_top">203 customer reviews</a>)</span>

Here is the code I am using that does not work

preg_match('/^\D*(\d+)customer reviews.*$/',$results[0], $clean_results);
echo "<pre>";
print_r( $clean_results);
echo "</pre>";
//expecting 203

It is just returning

<pre>array ()</pre>

Solution

  • Your regexp has two problems.

    First, there are other numbers in the string before the number of customer reviews (like 4.3 out of 5 stars and height="12"), but \D* prevents matching that -- it only matches if there are no digits anywhere between the beginning of the string and the number of reviews.

    Second, you have no space between (\d+) and customer reviews, but the input string has a space there.

    There's no need to match any of the string before and after the part that contains the number of customer reviews, just match the part you care about.

    preg_match('/(\d+) customer reviews/',$results[0], $clean_results);
    $num_reviews = $clean_results[1];
    

    DEMO