I need to scrape the data from an HTML table and orientate the columnar data as rows of a 2d array.
My code does not display the correct structure.
HTML Table:
<html>
<head>
</head>
<body>
<table>
<tbody>
<tr>
<td>header</td>
<td>header</td>
<td>header</td>
</tr>
<tr>
<td>content</td>
<td>content</td>
<td>content</td>
</tr>
<tr>
<td>test</td>
<td>test</td>
<td>test</td>
</tr>
</tbody>
</table>
</body>
</html>
PHP CODE:
$DOM = new \DOMDocument();
$DOM->loadHTML($valdat["table"]);
$Header = $DOM->getElementsByTagName('tr')->item(0)->getElementsByTagName('td');
$Detail = $DOM->getElementsByTagName('td');
//#Get header name of the table
foreach($Header as $NodeHeader)
{
$aDataTableHeaderHTML[] = trim($NodeHeader->textContent);
}
//print_r($aDataTableHeaderHTML); die();
//#Get row data/detail table without header name as key
$i = 0;
$j = 0;
foreach($Detail as $sNodeDetail)
{
$aDataTableDetailHTML[$j][] = trim($sNodeDetail->textContent);
$i = $i + 1;
$j = $i % count($aDataTableHeaderHTML) == 0 ? $j + 1 : $j;
}
//print_r($aDataTableDetailHTML); die();
//#Get row data/detail table with header name as key and outer array index as row number
for($j = 0; $j < count($aDataTableHeaderHTML); $j++)
{
for($i = 1; $i < count($aDataTableDetailHTML); $i++)
{
$aTempData[][$aDataTableHeaderHTML[$j]][] = $aDataTableDetailHTML[$i][$j];
}
}
$aDataTableDetailHTML = $aTempData;
echo json_encode($aDataTableDetailHTML);
My result:
[{"header":["content"]},{"header":["test"]},{"header":["content"]},{"header":["test"]},{"header":["content"]},{"header":["test"]}]
We need such a result:
[
["header","content","test"],
["header","content","test"],
["header","content","test"]
]
I've changed a lot of the code to (hopefully) simplify it. This works in two stages, the first is to extract the <tr>
elements and build up an array of all of the <td>
elements in each row - storing the results into $rows
.
Secondly is to tie up the data vertically by looping across the first row and then using array_column()
to extract the corresponding data from all of the rows...
$trList = $DOM->getElementsByTagName("tr");
$rows = [];
foreach ( $trList as $tr ) {
$row = [];
foreach ( $tr->getElementsByTagName("td") as $td ) {
$row[] = trim($td->textContent);
}
$rows[] = $row;
}
$aDataTableDetailHTML = [];
foreach ( $rows[0] as $col => $value ) {
$aDataTableDetailHTML[] = array_column($rows, $col);
}
echo json_encode($aDataTableDetailHTML);
Which with the test data gives...
[["header","content","test"],["header","content","test"],["header","content","test"]]