Search code examples
phpdomsimple-html-dom

PHP Simple HTML DOM getting two different cell classes (inline) but one may be empty


Okay I searched all over for this and tried many things (3 hours) but, I am still stumped.

Here is my HTML snippet from somepage.com.

<table class="infotable" cellpadding="0" cellspacing="0" width="185">
    <tr>
        <td class="what"">First name:</td>
        <td class="whatdet">Jim</td>
    </tr>
    <tr>
        <td class="what">Last name:</td>
        <td class="whatdet">Bo</td>
    </tr>
    <tr>
        <td class="what">Age:</td>
        <td class="whatdet"></td> <!--PROBLEM IS HERE WITH EMPTY CELL-->
    </tr>
    <tr>
        <td class="what">Sex:</td>
        <td class="whatdet">Rarely</td>
    </tr>
    <tr>
        <td class="what">City:</td>
        <td class="whatdet"></td> <!--PROBLEM IS HERE WITH EMPTY CELL-->
    </tr>
    <tr>
        <td class="what">State:</td>
        <td class="whatdet">California</td>
    </tr>
</table>

Here is my Code snippet with a test with attempt to show me a little more info with isset line. (Yes, I am obviously clueless)

require_once 'simpledom/simple_html_dom.php';
$html = file_get_html('http://somepage.com/');
$i=0;
$tabletitles = array(); /* Get the titles 'what' Cell Names */
$tabledetails = array(); /* Get the Details in 'whatdet' Cells */
$tables = $html->find('table[@class="infotable"]'); /* Where both reside in */

foreach($tables as $table) {
    $titles = $table->find('td[@class="what"]');
    $titlesd = $table->find('td[@class="whatdet"]');

    foreach($titles as $title)  {
        /*UPDATE NOTICED A PROBLEM WITH a character like $ so I added */
        /*will do the same in $titlesd if I can figure out how to get it  */

        $title1 = preg_replace('/([?#^&*()$\\/])/', '\\\\$1', $title);

        echo $title1; /*Changed from $title*/

        if (isset($titlesd[$i])) /*this is just for testing*/
            echo $titlesd[$i].' is either 0, empty, or not set at all';

        /* WHAT I WANT is echo '<tr><td>'. $title .'</td><td>'. $titlesd[$i] . </td></tr>;*/
        $i++;
    }
}

What I am trying to attempt:

------------|----------
First name  | Jim
----------- |---------
Last name   | Bo
----------- |---------
Age         |
----------- |---------
Sex         | Rarely
----------- |---------
City        |
----------- |---------
State       | California
----------- |---------

But what I get now:

------------|----------
First name  | Jim
----------- |---------
Last name   | Bo
----------- |---------
Age         | Rarely
----------- |---------
Sex         | California
----------- |---------
City        |
----------- |---------
State       | 
----------- |---------

I cannot seem to figure out how to assign a "blank" to $titlesd[$i] or skip it when it's in the loop. So, I keep getting undesired results. (to say the least)

So again, I beseech a guru here to give me another highly prized lesson. Thank you..


Solution

  • If I'm not mistaken, this is what you try to do :

    require_once 'simpledom/simple_html_dom.php';
    $html = file_get_html('http://somepage.com/');
    
    foreach($html->find('table[@class="infotable"]') as $table) {
        foreach($table->find('tr') as $line)  {
            $titles = $line->find('td[@class="what"]', 0);
            $titlesd = $line->find('td[@class="whatdet"]', 0);
    
            echo '<tr>'
                    .'<td>'.htmlspecialchars($titles).'</td>'
                    .'<td>'.htmlspecialchars($titlesd).'</td>'
                .'</tr>';
        }
    }
    

    I explain a bit :

    nb: as ->find() with index will return NULL if there is no find, I make this code assuming at worst echo $titlesd will be NULL and display nothing