Search code examples
htmldomtraversal

How to get text of first HTML element


How to traverse HTML DOM in order to get text of each and every element recursively. I need to get text of HTML element only without text of child elements. Consider if nested HTML elements are there, then how to get the text of first element eliminating the nested child elements and respective texts.

I have tried "elem.InnerHTML", "elem.InnerTEXT", "elem.TextContent" but these all seems to returning text from all nested elements as well.

Code sample: I have HTML as below:

<HTML>
    <HEAD></HEAD>
    <BODY>
        <DIV> SOMEDIVTEXT 
            <TABLE>
              <TBODY>
               <TR><TD>COLUMN1</TD></TR>
               <TR><TD>COLUMN2</TD></TR>
              </TBODY>
            </TABLE>
        </DIV>
    </BODY>
</HTML>

I just need to extract SOMEDIVTEXT while current node pointer is at DIV without getting text of nested children.


Solution

  • Okay, so assuming (1) you're writing Javascript in the browser, and (2) you have the element as an object (you mentioned 'elem' in the question, so I guess you have?), then you can get the children of an element using elem.childNodes().

    This will give you a Nodelist object containing each node within the element. In the case of the HTML you quoted in the question, this will be two nodes; the first will be a text node containing the text SOMEDIVTEXT, and the second will be an element node containing the <TABLE> element.

    So elem.childNodes[0] will get the text you are trying to extract.

    But the DOM is pretty flexible, so there are other properties and methods that can also get the same effect, including 'elem.firstChild' as mentioned in another answer.