Search code examples
htmlparsingelementnutchjaunt-api

Jaunt - check if there is specific element


I'm using Nutch to crawl website and currently writing a plugin. Jaunt 1.0.0.1 is used to Parse HTML. For example, I have a row

Element infoBooksItem = body.findFirst("<div class=info_books_item>");

Which gets and error, when on page is no <div class=info_books_item>. Currently I'm looking at Jaunt JavaDocs, but can't figure out how to check, is there such element or not.


Solution

  • You are correct that the findFirst method throws an Exception if the element is not found.. You can use a try-catch block to catch the NotFound Exception in your code, and take it from there, or if you can write a helper method that does not throw an Exception (if you just need a boolean detector)

    public boolean has(Element element, String target){
      try{
        element.findFirst(target);
        return true;
      }
      catch(NotFound n){
        return false;
      }
    }
    

    Alternatively, you can use the findEvery method, which does not throw an Exception, as a boolean detector:

    if(body.findEvery("<div class=info_books_item>").size() > 0){
    }