Search code examples
screen-scraping

Best screen scraper, simple html dom or snoopy?


which one is better for screen scraping? simple html dom or snoopy ?? i use simple html dom and find it comfortable.. does snoopy has any advantage over simple html dom?

my requirements : if i wanna scrape contents from a page(after login).. simple html dom is easy but it takes a lotta time to print the results..


Solution

  • Is Snoopy that well known / mature of a package?

    If it's not, then all other things being equal, I'd probably go with generic HTML DOM code - especially if the scraping is somewhat simple.

    But only you know when your code is starting to get too big, unmanageable, etc., at which point it might be better to look at another tool out there like Snoopy.

    (Which, admittedly, I don't have experience with; it's apparently at http://sourceforge.net/projects/snoopy/ for those not familiar with it - "Snoopy is a PHP class that simulates a web browser. It automates the task of retrieving web page content and posting forms, for example.")

    The real reason I'm posting, even though I don't know Snoopy per se and thus can't definitively answer your question, is to ask if you've considered using Selenium (http://www.seleniumhq.org/) instead of Snoopy.

    Selenium is a fairly well-known testing tool, and it occurred to me that one of the nice things about using that for what you're doing (if you can) is that it has built in tests.

    The reason that's good is that screen scraping is kind of an inherently brittle task - if the target site changes something, blam, your scraping fails. So it's kind of a nice design to have an automated scrape/test-that-scraping-worked system.

    Something to think about, anyway.