Search code examples

How to get all TEXT outside elements in a HTML document

I'm using Symfony DomCrawler to get all text in a document.

$this->crawler->filter('p')->each(function (Crawler $node, $i) {
    // process text

I'm trying to gather all text within the <body> that are outside of elements.

    This is an example
    another example
        again, another piece of text <br/>
        with an annoy BR in the middle

I'm using PHP Symfony and can use XPath (preferred) or RegEx.


  • The string value of the entire document can be obtained with this simple XPath:


    All text nodes in the document would be:


    The immediate text node children of body would be:


    Note that the XPaths that select text nodes would typically be converted to concatenated string values, depending upon context.