Search code examples
domcurlscreen-scraping

CURL class that works like simple HTML DOM?


So i've been using both CURL and simple_html_dom for a while, for anyone who is not familiar with simple HTML DOM - It allows you to go through elements with ease and without the hassle of having to use regex/exploding stuff and so on.

E.g.

$html = file_get_html($obj->loc);
$item['title'] = $html->find('#Prod-Name h1',0)->plaintext;

However as far as i'm aware this does not support cookies - like CURL does, is there something out there that does?

Would be interested to hear peoples experience in this screen scraping/bot creation.


Solution

  • You can just download with curl and parse it with the parsing lib of your choice. I use this method sometimes but I'm not very happy with it, it would be nice if php had some decent scraping libs and even nicer if they were built in.