Search code examples
screen-scraping

How do screen scrapers work?


I hear people writing these programs all the time and I know what they do, but how do they actually do it? I'm looking for general concepts.


Solution

  • Technically, screenscraping is any program that grabs the display data of another program and ingests it for it's own use.

    Quite often, screenscaping refers to a web client that parses the HTML pages of targeted website to extract formatted data. This is done when a website does not offer an RSS feed or a REST API for accessing the data in a programmatic way.

    One example of a library used for this purpose is Hpricot for Ruby, which is one of the better-architected HTML parsers used for screen scraping.