Search code examples
phphtmlarrayshtml-parsingdomdocument

Convert 2-column HTML table contents to 2d array


I am trying to parse the cell values of an HTML table to an indexed array of associative arrays with predetermined keys using PHP.

$htmlContent = '<table>
  <tr>
    <th>test1</th>
    <td>test1-1</td>
  </tr>
  <tr>
    <th>test2</th>
    <td>test2-2</td>
  </tr>
</table>';

I'd like this result:

[
    ['name' => "test1", 'value' => "test1-1"],
    ['name' => "test2", 'value' => "test2-2"],
]

My current result is only:

[
    ['test1' => 'test1-1', 'test2' => 'test2-2']
];

Here my coding attempt:

$DOM = new DOMDocument();
$DOM->loadHTML($htmlContent);

$Header = $DOM->getElementsByTagName('th');
$Detail = $DOM->getElementsByTagName('td');

//#Get header name of the table
foreach($Header as $NodeHeader) 
{
    $aDataTableHeaderHTML[] = trim($NodeHeader->textContent);
}
//print_r($aDataTableHeaderHTML); die();

//#Get row data/detail table without header name as key
$i = 0;
$j = 0;
foreach($Detail as $sNodeDetail) 
{
    $aDataTableDetailHTML[$j][] = trim($sNodeDetail->textContent);
    $i = $i + 1;
    $j = $i % count($aDataTableHeaderHTML) == 0 ? $j + 1 : $j;
}
//print_r($aDataTableDetailHTML); die();

//#Get row data/detail table with header name as key and outer array index as row number
for($i = 0; $i < count($aDataTableDetailHTML); $i++)
{
    for($j = 0; $j < count($aDataTableHeaderHTML); $j++)
    {
        $aTempData[$i][$aDataTableHeaderHTML[$j]] = $aDataTableDetailHTML[$i][$j];
    }
}
$aDataTableDetailHTML = $aTempData;
unset($aTempData);
print_r($aDataTableDetailHTML);
die();

Solution

  • Your code is working too hard to try to keep the columnar data with the respective row.

    To make things easier, iterate the row (<tr>) elements, then access the elements within the given row.

    Code (Demo) or (Alternative Demo)

    $dom = new DOMDocument();
    $dom->loadHTML($html);
    $result = [];
    foreach ($dom->getElementsByTagName('tr') as $row) {
        $result[] = [
            'name' => $row->getElementsByTagName('th')->item(0)->nodeValue,
            'value' => $row->getElementsByTagName('td')->item(0)->nodeValue,
        ];
    }
    var_export($result);