I'm looking around for a crawling tool, written in Java, to detect invalid url's in our sites.
The difficulty is that much of the url's are done with javaScript, CSS3 and Ajax. So just getting the content of the site's url wouldn't do.
The ideal would be a headless tool that is able to do the javaScript, CSS styling and AJAX calls and spits out the various url's it accessed in doing so.
I do realize this is a tall order, but maybe it exists somewhere ?
I suggest using on http://htmlunit.sourceforge.net/, which is made for those things.