I'm using Nutch to crawl website and currently writing a plugin. Jaunt 1.0.0.1 is used to Parse HTML. For example, I have a row
Element infoBooksItem = body.findFirst("<div class=info_books_item>");
Which gets and error, when on page is no <div class=info_books_item>
.
Currently I'm looking at Jaunt JavaDocs, but can't figure out how to check, is there such element or not.
You are correct that the findFirst method throws an Exception if the element is not found.. You can use a try-catch block to catch the NotFound Exception in your code, and take it from there, or if you can write a helper method that does not throw an Exception (if you just need a boolean detector)
public boolean has(Element element, String target){
try{
element.findFirst(target);
return true;
}
catch(NotFound n){
return false;
}
}
Alternatively, you can use the findEvery method, which does not throw an Exception, as a boolean detector:
if(body.findEvery("<div class=info_books_item>").size() > 0){
}