Search code examples
phphtml-parsingsimple-html-domhtml-table

Remove rowspan on tables in PHP


There is this table i want to load into a multi-dimensional array. The problem is that since the table has rowspan values each line may have different cell counts. So i have to remove rowspan and add null values instead of these cells.

This is the table i have(Original file)(Have like 5k rows.)Original Table

I have to add this table like this in order to have a proper array.enter image description here

Removing colspan values for the first line was easy. But removing rowspans in current method sometimes cause extra values in array.

My current PHP file for this:

<?php
ini_set('display_errors', true);
ini_set('mbstring.internal_encoding','UTF-8');
ini_set("memory_limit", "1024M"); 
ini_set('max_execution_time', 300);
include('simple_html_dom.php');

// Create a DOM object
$html = new simple_html_dom();

$html->load_file('stok.html');

$table = array();
$kac = array();

foreach($html->find('tr') as $row) {
        $satir = array();
        $j = 0;
        foreach($row->find('td') as $element) {
            if($kac[$j]['deger']>0){
                $satir[]='';
                $kac[$j]['deger']=$kac[$j]['deger']-1;
                $j++;
                while($kac[$j]['deger']>0){
                    $satir[]='';
                    $kac[$j]['deger']=$kac[$j]['deger']-1;
                    $j++;
                }
            }else{
                $j++;
                if(isset($element->rowspan)){
                    $kac[$j]['deger']=($element->rowspan)-1;
                }
                $satir[] = str_replace('&nbsp;', '', strip_tags($element->innertext));
            }

            if(isset($element->colspan)){
                $sayi=($element->colspan)-1;
                for($i=1;$i<=$sayi;$i++){
                    $satir[] = '';
                }
            }
        }
        $table[] = $satir;
}

echo '<pre>';
print_r($table);
echo '</pre>';
?>

My Current Output Sample: (See some Array values has 21, 23 and 17 items in it. Correct one is 21 items. (20 as index value)) --Didn't remove the table values in example output--

Array
(
    [0] => Array
        (
        )

    [1] => Array
        (
            [0] =>   Envanter (R/B/K)   (Filitre Kodu  :  sa)    (Envanter Tarihi :28/11/2012  )    (Depo : 100)
            [1] => 
            [2] => 
            [3] => 
            [4] => 
            [5] => 
            [6] => 
            [7] => 
            [8] => 
            [9] => 
            [10] => 
            [11] => 
            [12] => 
            [13] => 
            [14] => 
            [15] => 
            [16] => 
            [17] => 
            [18] => 
            [19] => 
            [20] => 
        )

    [2] => Array
        (
            [0] => Model
            [1] => Stok Adı
            [2] => R
            [3] => Renk Adı
            [4] => B
            [5] => B
            [6] => B
            [7] => B
            [8] => B
            [9] => B
            [10] => B
            [11] => B
            [12] => B
            [13] => B
            [14] => B
            [15] => B
            [16] => B
            [17] => B
            [18] => B
            [19] => Toplam
            [20] => Resim
        )

    [3] => Array
        (
            [0] => 
            [1] => 
            [2] => 
            [3] => 
            [4] => 34
            [5] => 36
            [6] => 38
            [7] => 40
            [8] => 42
            [9] => 44
            [10] => 46
            [11] => 48
            [12] => 50
            [13] => 52
            [14] => 54
            [15] => 56
            [16] => 58
            [17] => 60
            [18] => 62
            [19] => Toplam
            [20] => 
        )

    [4] => Array
        (
            [0] => 1K011621110
            [1] => NIHAN 2111 KABAN
            [2] => 064
            [3] => FES
            [4] => 
            [5] => 
            [6] => 
            [7] => 
            [8] => 
            [9] => 
            [10] => 
            [11] => 
            [12] => 1.00
            [13] => 
            [14] => 
            [15] => 
            [16] => 
            [17] => 
            [18] => 
            [19] => 1.00
            [20] => Resim
        )

    [5] => Array
        (
            [0] => 
            [1] => 
            [2] => Toplam :
            [3] => 
            [4] => 
            [5] => 
            [6] => 
            [7] => 
            [8] => 
            [9] => 
            [10] => 
            [11] => 
            [12] => 
            [13] => 
            [14] => 
            [15] => 1.00
            [16] => 
            [17] => 
            [18] => 
            [19] => 
            [20] => 
            [21] => 
            [22] => 1.00
            [23] => 
        )

    [6] => Array
        (
            [0] => 
            [1] => 34
            [2] => 36
            [3] => 38
            [4] => 40
            [5] => 42
            [6] => 44
            [7] => 46
            [8] => 48
            [9] => 50
            [10] => 52
            [11] => 54
            [12] => 56
            [13] => 58
            [14] => 60
            [15] => 62
            [16] => Toplam
            [17] => 
        )

    [7] => Array
        (
            [0] => 1K011624760
            [1] => NIHAN 2476 KABAN
            [2] => 001
            [3] => SIYAH
            [4] => 
            [5] => 
            [6] => 
            [7] => 
            [8] => 
            [9] => 1.00
            [10] => 
            [11] => 1.00
            [12] => 
            [13] => 
            [14] => 
            [15] => 
            [16] => 
            [17] => 
            [18] => 
            [19] => 2.00
            [20] => Resim
        )

Thanks in advance.

UPDATE FOR SOLUTION WITH WORKING CODE: Currently fills all empty cells with "***"

<?php
ini_set('display_errors', true);
ini_set('mbstring.internal_encoding','UTF-8');
ini_set("memory_limit", "1024M"); 
ini_set('max_execution_time', 300);
include('simple_html_dom.php');

// Create a DOM object
$html = new simple_html_dom();

$html->load_file('stok.html');

$satir = array();
$rowcount = 0;
foreach($html->find('tr') as $row) {
        $colcount = 0;
        foreach($row->find('td') as $element) {
            while($satir[$rowcount][$colcount]!=''){
                $colcount++;
            }
            $satir[$rowcount][$colcount] = strip_tags(str_replace('&nbsp;', '***', $element->innertext));

            if(isset($element->colspan)){
                $sayi=($element->colspan)-1;
                for($i=1;$i<=$sayi;$i++){
                    $satir[$rowcount][$colcount+$i] = '***';
                }
            }
            if(isset($element->rowspan)){
                $sayi=($element->rowspan)-1;
                for($i=1;$i<=$sayi;$i++){
                    $satir[$rowcount+$i][$colcount] = '***';
                }
            }
            $colcount++;
        }
        $rowcount++;
}

echo '<pre>';
print_r($satir);
echo '</pre>';
?>

Solution

  • Based on @deceze 's helpful comment, i used a different way to solve the issue. The code below will do the work. But it will fill all empty fields with ***. You may need to re-visit whole array to empty it after. (The code for this is located below)

    // Create a DOM object
    $html = new simple_html_dom();
    
    $html->load_file('stok.html');
    
    $satir = array();
    $rowcount = 0;
    foreach($html->find('tr') as $row) {
            $colcount = 0;
            foreach($row->find('td') as $element) {
                while($satir[$rowcount][$colcount]!=''){
                    $colcount++;
                }
                $satir[$rowcount][$colcount] = strip_tags(str_replace('&nbsp;', '***', $element->innertext));
    
                if(isset($element->colspan)){
                    $sayi=($element->colspan)-1;
                    for($i=1;$i<=$sayi;$i++){
                        $satir[$rowcount][$colcount+$i] = '***';
                    }
                }
                if(isset($element->rowspan)){
                    $sayi=($element->rowspan)-1;
                    for($i=1;$i<=$sayi;$i++){
                        $satir[$rowcount+$i][$colcount] = '***';
                    }
                }
                $colcount++;
            }
            $rowcount++;
    }
    
    echo '<pre>';
    print_r($satir);
    echo '</pre>';
    ?>
    

    The code block below will clear the array from those asterisks i mentioned above.

    $itemcount=count($satir)-1;
    for($i=1; $i<=$itemcount; $i++){
        for($j=0; $j<=20; $j++){
            if($satir[$i][$j]=='***'){
                $satir[$i][$j]='';
            }
        }
    }