I'm doing some screen scraping using WATIJ, but it can't read HTML tables (throws NullPointerExceptions or UnknownObjectExceptions). To overcome this I read the HTML and run it through JTidy to get well-formed XML.
I want to parse it with XPath, but it can't find a <table ...>
by id
even though the table is there in the XML plain as day. Here is my code:
XPathFactory factory=XPathFactory.newInstance();
XPath xPath=factory.newXPath();
InputSource inputSource = new InputSource(new StringReader(tidyHtml));
XPathExpression xPathExpression=xPath.compile("//table[@id='searchResult']");
String expression = "//table[@id='searchResult']";
String table = xPath.evaluate(expression, inputSource);
System.out.println("table = " + table);
The table is an empty String.
The table is in the XML, however. If I print the tidyHtml
String it shows
<table
class="ApptableDisplayTag"
id="searchResult"
style="WIDTH: 99%">
I haven't used XPath before so maybe I'm missing something.
Can anyone set me straight? Thanks.
The solution was to drop WATIJ and switch to Google WebDriver. WebDriver documents how different browsers handle case in xpath statements.