Search code examples
phpcurlsimple-html-dom

PHP internal error in simple_html_dom function


I try extract data from website using curl and simple_htmlDOM. Data contains time table, lecture and teacher. Code normally works but it gives a internal error 500.

function parse($curl){
    $html=new simple_html_dom();
    $html->load($curl);
    $legend=$html->find('div.mainpage',0)->children(6);//legenda
    $table=$html->find('div.mainpage',0)->children(3);//table body
    echo $table->outertext;
    echo $legend->outertext;
    echo "<p>";
    foreach ($html->find('td.rozvrh-pred')as $subject){
        $subjecttextname=$subject->children(0)->children(2)->innertext;
        $subjecttextlecture=$subject->children(0)->children(5)->children(0)->innertext; //internal error point to this row to function children
        echo $subjecttextname." : ".$subjecttextlecture."<br>";

    }
    echo "</p>";
}

Is there any way to fix this ? [UPDATE]

The data I am approaching looks like this:

 <td  class="" align="left"><small></small></td><td  width="18" colspan="2" align="center" class="rozvrh-pred">
<small>
<a href="../mistnosti/?zobrazit_mistnost=922;zpet=../katalog/rozvrhy_view.pl?rozvrh_student=79992,zobraz=1;lang=en">ab300 (BA-MD-FEI A-B)</a><br/>
<a href="../katalog/syllabus.pl?predmet=313986;zpet=../katalog/rozvrhy_view.pl?rozvrh_student=79992,zobraz=1;lang=en">Algebraic structures</a>
&nbsp;
<sup>(1)</sup><br />
<i><a href="../lide/clovek.pl?id=733;zpet=../katalog/rozvrhy_view.pl?rozvrh_student=79992,zobraz=1;lang=en">TEACHER</a></i>
</small>
</td>

But how can I aproach to the Text values for example Algerbraic Structures or Teacher?


Solution

  • Test every thing you get from simple html dom with is_object(). example:

    $html = str_get_html($str_html);
    if(!is_object($html)) { 
        //Log error or return error
        return false;
    }
    
    $legend=$html->find('div.mainpage',0)->children(6);
    if(!is_object($legend)) { 
        //Log error or return error
        return false;
    }
    

    If it's not an object and you attempt further parsing with simple html dom then you will get a fatal error every time.