Search code examples
phpparsingsimple-html-dom

String of file_get_html can't be edited?


Consider this simple piece of code, working normally using the PHP Simple HTML DOM Parser, it outputs current community.

<?php

    //PHP Simple HTML DOM Parser from simplehtmldom.sourceforge.net
    include_once('simple_html_dom.php');

    //Target URL
    $url = 'http://stackoverflow.com/questions/ask';

    //Getting content of $url
    $doo = file_get_html($url);

    //Passing the variable $doo to $abd
    $abd = $doo ;

    //Trying to find the word "current community"
    echo $abd->find('a', 0)->innertext; //Output: current community. 

?>

Consider this other piece of code, same as above but I add an empty space to the parsed html content (in the future, I need to edit this string, so I just added a space here to simplify things).

<?php

    //PHP Simple HTML DOM Parser from simplehtmldom.sourceforge.net
    include_once('simple_html_dom.php');

    //Target URL
    $url = 'http://stackoverflow.com/questions/ask';

    //Getting content of $url
    $doo = file_get_html($url);

    //Passing the variable $url to $doo - and adding an empty space.
    $abd = $doo . " ";

    //Trying to find the word "current community"
    echo $abd->find('a', 0)->innertext; //Outputs: nothing.     
?>

The second code gives this error:

PHP Fatal error:  Call to undefined function file_get_html() in /home/name/public_html/code.php on line 5

Why can't I edit the string gotten from file_get_html? I need to edit it for many important reasons (like removing some scripts before processing the html content of the page). I also do not understand why is it giving the error that file_get_html() could not be found (It's clear we're importing the correct parser from the first code).

Additional note:

I have tried all those variations:

include_once('simple_html_dom.php');
require_once('simple_html_dom.php');
include('simple_html_dom.php');
require('simple_html_dom.php');

Solution

  • file_get_html() returns an object, not a string. Attempting to concatenate a string to an object will call the object's _toString() method if it exists, and the operation returns a string. Strings do not have a find() method.

    If you want to do as you have described read the file contents and concatenate the extra string first:

    $content = file_get_contents('someFile.html');
    $content .= "someString";
    $domObject  = str_get_html($content);
    

    Alternatively, read the file with file_get_html() and manipulate it with the DOM API.