I'm decent at PHP(far from an expert) but a pure novice when it comes to regexp and scraping. I wanted to do a little bit of scraping to help with some research and to educate myself, but I've ran into a problem. I want to extract prize from the following part of a page:
<th valign="top"> Prize pool:
</th>
<td> $75,000
</td></tr>
Needless to say, the prize pool value will change. I want to get the prize, and only the prize from this part (in this example the script should print out $75,000).
This is what I have so far:
preg_match('/Prize pool:\n<\/th>\n<td>(.*)/i', $file_string, $prize);
However, this prints out:
Prize pool:
</th>
<td> $75,000
preg_match('/Prize pool:.+(\$\d+(?:\.|,)\d+)/is', $file_string, $prize);
echo '<pre>' . print_r($prize, 1) . '</pre>';
Like this.
A little explanation
.
- to search for any single character, but not new line char "\n"
+
- means one or more repetitions
So, .+
means that after "Prize pool:" must be more than one any char
(...)
It is called a pocket. Each pocket in regex will be located in a each element of array ($prize
)
$
in patter means as end of line, therefore we need conversion it in single char by escaping it like this \$
\d
- means one number from 0 to 9. And \d+ one or more numbers
(?:...)
this is pocket too, but it not will be saved in $prize, because we used ?:
after (
As we know .
is any single char, therefore for conversion it to dot we need escape it as \.
, \.|,
means we looking .
or ,
/here pattern/i
modificator i
here means, that regex will be no case insensitive
/here pattern/s
modificator s
means that metacharacter .
will include char of new line.