I am trying to scrape the prices block out of a webpage and I want to match the contents between the opening and closing paragraph tags which have the prices in. However the problem is in the html output source this is spit onto multiple lines with multiple white spaces. Here is a sample of the output http://pastebin.com/hfeuHqTN
I am trying to use:
$pricesClass = '/<p class="price-wrap">\n(.*)/';
preg_match_all($pricesClass, $page, $pricesMatches);
How can I match the whole of the paragraph with the class of price-wrap until the closing paragraph tag?
At the moment it just matches the first two lines up to:
<p class="price-wrap"><strong class="product-price" itemprop="price">
I would like to match the whole thing e.g.
<p class="price-wrap"><strong class="product-price" itemprop="price"> £120</strong> was <del>£186.00</del></p>
Use a proper HTML
parser like DOMDocument and preg_replace (\s+
) only to remove the
“whitespace characters” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed)
$dom = new DOMDocument();
$dom->loadHTML(file_get_contents("http://thesite.com");
$xpath = new DOMXpath($dom);
foreach ($xpath->query("//p[@class='price-wrap']") as $pText){
echo preg_replace("/\s+/", "", $pText->textContent);
}