Search code examples
phphtmllaravelguzzlegoutte

Get entire HTML, not just text with Goutte


I'm parsing a website and I have a problem, because it has some text split up with <br>, but when I use $node->text(), there's not even a space in place of that <br>.

How can I do to get the <br> too or at least replace it with a space?

The HTML is something like this:

<span>Some<br>Text</span>

Currently I get SomeText and I want it to be Some Text;

Thanks!


Solution

  • You can retrieve the HTML for that node instead of the text, and replace the <br> tags with spaces yourself. Something like this should do just fine:

    str_replace('<br>', ' ', strip_tags($node->html(), '<br>'));
    

    The strip_tags is there to remove anything that's not <br>, so it would be the equivalent of the text() method, but allow the line break tags. Then they can be replaced with spaces using str_replace. The above will transform this:

    <span>Some<br>Text</span>
    

    into this

    Some Text