I'm parsing a website and I have a problem, because it has some text split up with <br>
, but when I use $node->text()
, there's not even a space in place of that <br>
.
How can I do to get the <br>
too or at least replace it with a space?
The HTML is something like this:
<span>Some<br>Text</span>
Currently I get SomeText
and I want it to be Some Text
;
Thanks!
You can retrieve the HTML for that node instead of the text, and replace the <br>
tags with spaces yourself. Something like this should do just fine:
str_replace('<br>', ' ', strip_tags($node->html(), '<br>'));
The strip_tags
is there to remove anything that's not <br>
, so it would be the equivalent of the text()
method, but allow the line break tags. Then they can be replaced with spaces using str_replace
. The above will transform this:
<span>Some<br>Text</span>
into this
Some Text