Search code examples
xpathweb-scrapingweb-crawlerscrapyphpcrawl

Can PHPCrawl can be used for scraping websites and how different is from Scrapy?


I want to scrape few websites and many suggested Scrapy. It is Python based and since I am very familiar with PHP I looked for alternatives.

I got a crawler PHPCrawl. I am not sure if it is just a crawler or will it provides scraping facility as well. If it can be used for scraping- will it support XPath or Regular expressions.

How can it be compared with Scrapy which is on Python.

Please suggest me which is best to use for scraping the websites.

Thanks


Solution

  • PHPCrawl is a pure crawler, it delivers found pages and their sourcecode to users "as they are" (together with some context-information). Therefor it's fast, it's able ot use multi processes and has tons of options to configure it.

    Can't say much about Scrapy since i didn't use it so far.