Search code examples
xpathpredicatehtmlcleaner

HTMLCleaner and XPath


Does HTMLCleaner support the XPath position() function and the use of predicates to denote positions?

My code is as follows:

HtmlCleaner htmlCleaner = new HtmlCleaner();
String sourceUrl = "http://jobs.alaska.gov/RR/WARN_notices.htm";
URL url = new URL(sourceUrl);
URLConnection urlConnection = url.openConnection();
TagNode rootTagNode = htmlCleaner.clean(new InputStreamReader(urlConnection.getInputStream()));
String xpathOne = "//table[2]/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[1]/td/div/span/text()";
// String xpathTwo = "//table[2]/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[3]/td/div/span/text()";
Object[] xPathNodes = rootTagNode.evaluateXPath(xpathOne);
// Object[] xPathNodes = rootTagNode.evaluateXPath(xpathTwo);

for(Object object : xPathNodes) {
   System.out.println(object);
}

xPathOne executes correctly and returns the table row with headers. xPathTwo doesn't return anything but it should return the first row of data in the table. Any help would be greatly appreciated. Thanks.


Solution

  • I think there are no span elements in there so perhaps shortening the path to //table[2]/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[3]/td/div/text() is what you want.