Search code examples
phphtmlregexskipstrip-tags

How to skip html headings and find number with regex?


I want to find NUMBER, but skip H1 , H2 , H3 and so on.. (all possible HTML heading variants)

Example 1:

<div>Today is good day. I got<h3>3<span> lotto tickets</span></h3></div>

Example 2:

I want to buy lotto tickets. <h1>Maybe 10 is enough</h1>

Example 3:

I want to buy lotto tickets. <h1>4 or 5</h1> is enough.

I have this code:

lotto tickets\D{0,15}(\d+\,\d+|\d+\.\d+|\d+)

But every time i get numbers from HTML tag.. <h3> (3) , <h1> (1). How i can skip them?

In example 1 i should get nothing

In example 2 i should get number 10

In example 3 i should get number 4

(Numbers can be with . or , example: 2.5)


Solution

  • This is one of those instances where perhaps regex isn't being used correctly.

    Yes, you could it just with regex, but a easier way to do it (as well as being faster to run), would be to run strip_tags() on your string first to get rid of all the HTML tags, and then just do a standard regex for the numbers.

    $string = "<h3>This is post number 10</h3>";
    $cleanString = strip_tags($string);
    preg_match("%\b[0-9]+\b%",$cleanString,$number);