Search code examples
phphtml-tablehtml-parsingsimple-html-domdomparser

Simple html dom parser - two rows in one


I trying to insert a table in database, and I want to convert two rows in one array. Can anyone help me out?

<table>
<tr class="pair"><td>1</td><td>2</td></tr>
<tr class="pair">td<>3</td><td>4</td></tr>
<tr class="unpair"><td>1</td><>2</td></tr>
<tr class="unpair"><td>3</td><td>4</td></tr>
</table>

<?php
require('simple_html_dom.php');
foreach($table->find('tr[class=pair') as $rowpair) {
$rowData = array();
foreach($rowpair->find('td') as $cell) {
$rowData[] = $cell->innertext;
}
foreach($table->find('tr[class=unpair') as $rowunpair) {
$rowData = array();
foreach($rowunpair->find('td') as $cell) {
$rowData[] = $cell->innertext;
}
?>

to obtain

<table>
<tr class="pair"><td>1</td><td>2</td><td>3</td><td>4</td></tr>
<tr class="unpair"><td>1</td><td>2</td><td>3</td><td>4</td></tr>
</table>

Solution

  • This should work to group all table rows by class.

    The basic logic is to loop through all the rows in a table and identify if it's seen that class before or not. If it hasn't, it'll store reference to that row as the 'canonical' row to use. If it has seen the class before, it will transfer over all it's children to the canonical row.

    This approach should work for any number of tables in a blog and any set of class names.

    <?php
    
        $str = '<table><tr class="pair"><td>1</td><td>2</td></tr><tr class="pair"><td>3</td><td>4</td></tr><tr class="unpair"><td>1</td><td>2</td></tr><tr class="unpair"><td>3</td><td>4</td></tr>
        </table>';
    
    
        $doc = new DOMDocument();
        $doc->loadHTML($str);
    
    
        $tables = $doc->getElementsByTagName('table');
        foreach ($tables as $table) {
    
            #For each TR in the table, group into rows
            $table_classes = array();
            $rows = $table->getElementsByTagName('tr');
    
    
            $row_list = array();
            foreach ($rows as $row) {
                array_push($row_list, $row);
            }
    
            for($i=0; $i<count($row_list); $i++){
    
                $row = $row_list[$i];
                $row_class = $row->getAttribute('class');
    
                if(!array_key_exists($row_class, $table_classes)){
    
                    #if this is the for occurrence of that clase, store this row as the original_row
                    $table_classes[$row_class] = $row;
    
                }else{
    
                    $original_row = $table_classes[$row_class];
    
                    #Move children over to original row
                    foreach ($row->childNodes as $child) {
    
                        $clone = $child->cloneNode(true);
                        $original_row->appendChild($clone);
                    }
    
                    #Now delete original
                    $row->parentNode->removeChild($row);
    
    
                }
            }
    
        }
    
    
        echo htmlspecialchars($doc->saveXML());
    
    ?>
    

    Returns:

    <table>
        <tr class="pair">
            <td>1</td>
    
            <td>2</td>
    
            <td>3</td>
    
            <td>4</td>
        </tr>
    
        <tr class="unpair">
            <td>1</td>
    
            <td>2</td>
    
            <td>3</td>
    
            <td>4</td>
        </tr>
    </table>