Search code examples
phpregexscreen-scraping

regarding regexp in PHP


I'm decent at PHP(far from an expert) but a pure novice when it comes to regexp and scraping. I wanted to do a little bit of scraping to help with some research and to educate myself, but I've ran into a problem. I want to extract prize from the following part of a page:

<th valign="top"> Prize pool:
</th>
<td> $75,000
</td></tr>

Needless to say, the prize pool value will change. I want to get the prize, and only the prize from this part (in this example the script should print out $75,000).

This is what I have so far:

preg_match('/Prize pool:\n<\/th>\n<td>(.*)/i', $file_string, $prize);

However, this prints out:

Prize pool:
</th> 
<td> $75,000

Solution

  • preg_match('/Prize pool:.+(\$\d+(?:\.|,)\d+)/is', $file_string, $prize);
    echo '<pre>' . print_r($prize, 1) . '</pre>';
    

    Like this.

    A little explanation

    . - to search for any single character, but not new line char "\n"

    + - means one or more repetitions

    So, .+ means that after "Prize pool:" must be more than one any char

    (...) It is called a pocket. Each pocket in regex will be located in a each element of array ($prize)

    $ in patter means as end of line, therefore we need conversion it in single char by escaping it like this \$

    \d - means one number from 0 to 9. And \d+ one or more numbers

    (?:...) this is pocket too, but it not will be saved in $prize, because we used ?: after (

    As we know . is any single char, therefore for conversion it to dot we need escape it as \., \.|, means we looking . or ,

    /here pattern/i modificator i here means, that regex will be no case insensitive

    /here pattern/s modificator s means that metacharacter . will include char of new line.