Search code examples
javajsoup

Get text from specific tags


Suppose I have some html form.

<form action="action_page.php">
  First name:<br>
  <input type="text" name="fistname" value="Mickey" />
  <br>
  Last name:<br>
  <input type="text" name="lastname" value="Mouse" />
  <br><br>
  <input type="submit" value="Submit">
</form>

I want to print this same as given in https://www.tutorialspoint.com/html/html_input_tag.htm

like

First Name : .........

Last name ......

I am able to taken input values. All I need is a way to read this first name last name text ( that is the text just before the input tags ).

I have seen methods like .text() or else in jsoups but they give all the text inside the tag. I want specific text. thank you.


Solution

  • To do this using Java's built in DOM, you could do the following:

    This code will find the first preceding text node to all of the elements in the document with the input tag. You can use Element#getAttribute to check whether the input element is an actual text input field rather than the submit button.

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document document = builder.parse(new FileInputStream("doc.xml"));//Load document from any InputSource or InputStream
    //Loop through all nodes with the input tag:
    NodeList nl = document.getElementsByTagName("input");
    for(int i = 0; i < nl.getLength(); i++){
        Node n = nl.item(i);
        if(n.getNodeType() != Node.ELEMENT_NODE)
            continue;
        Element e = (Element)n;
        Node previous = e;
        //Loop through all nodes before the input element:
        while((previous = previous.getPreviousSibling()) != null){
            if(previous.getNodeType() == Node.TEXT_NODE && previous.getTextContent().replaceAll("\\s+", "").length() > 0){
                System.out.println(previous.getTextContent().trim()); //Remove whitepsace from beginning and end of the text.
                break; //Break after finding the first text element of appropriate length.
            }
        }
    }
    

    Although I don't know anything about JSoup, I would assume you can access previous elements similarly to how you can in the code above.

    General note to readers

    I have used DOM rather than JSoup, please note that the OP asked for this in the comments.