Search code examples
javaweb-scrapinghtmlunit

How to determine a page change with HTMLUnit?


I'm looking for a smart way to determine if a web site has changed since I last ran a check with HTMLUnit against it.

I'm using HTMLUnit to scrape some values of a web page which fails from time to time because the page's layout has changed. In these cases I want to get notified that the page looks different since my last visit.

I thought about persisting the page object that I get via HTMLUnit, by simply writing it to a file. Next time I run my program, I could compare the fresh object with the persisted one.

Any opinions on this? Is there a smarter way to deal with this?


Solution

  • As there seems no smarter way to deal with this, I did what I suggested in my question. Getting the page, persisting the source and comparing this persisted html source with the fresh one next time I run the program.

    The downside is that it doesn't work with some pages like google.com as they seem to create the page dynamically. Most other sites work though.