I'm using Symfony DomCrawler to get all text in a document.
$this->crawler->filter('p')->each(function (Crawler $node, $i) {
// process text
});
I'm trying to gather all text within the <body>
that are outside of elements.
<body>
This is an example
<p>
blablabla
</p>
another example
<p>
<span>Yo!</span>
again, another piece of text <br/>
with an annoy BR in the middle
</p>
</body>
I'm using PHP Symfony and can use XPath (preferred) or RegEx.
The string value of the entire document can be obtained with this simple XPath:
string(/)
All text nodes in the document would be:
//text()
The immediate text node children of body
would be:
/body/text()
Note that the XPaths that select text nodes would typically be converted to concatenated string values, depending upon context.