Search code examples
unit-testingexceptionscreen-scrapingphpunit

unit tests for screen-scraping?


I'm new to unit testing so I'd like to get the opinion of some who are a little more clued-in.

I need to write some screen-scraping code shortly. The target system is a web ui where there'll be copious HTML parsing and similar volatile goodness involved. I'll never be notified of any changes by the target system (e.g. they put a redesign on their site or otherwise change functionality). So I anticipate my code breaking regularly.

So I think my real question is, how much, if any, of my unit testing should worry about or deal with the interface (the website I'm scraping) changing?

I think unit tests or not, I'm going to need to test heavily at runtime since I need to ensure the data I'm consuming is pristine. Even if I ran unit tests prior to every run, the web UI could still change between tests and runtime.

So do I focus on in-code testing and exception handling? Does that mean to draw a line in the sand and exclude this kind of testing from unit tests altogether?

Thanks


Solution

  • Unit testing should always be designed to have repeatable known results.

    Therefore, to unit test a screen-scraper, you should be writing the test against a known set of HTML (you may use a mock object to represent this)

    The sort of thing you are talking about doesn't really sound like a scenario for unit-testing to me - if you want to ensure your code runs as robustly as possible, then it is more, as you say, about in-code testing and exception handling.

    I would also include some alerting code, so they system made you aware of any occasions when the HTML does not get parsed as expected.