I am trying to get first 3 tags texts using the PHP Simple HTML DOM Parser and collecting those in array.
The table is like:
<table>
<tbody>
<tr>
<td>Floyd</td>
<td>Machine</td>
<td>Banking</td>
<td>HelpScout</td>
</tr>
<tr>
<td>Nirvana</td>
<td>Paper</td>
<td>Business</td>
<td>GuitarTuna</td>
</tr>
<tr>
<td>The edge</td>
<td>Tree</td>
<td>Hospital</td>
<td>Sician</td>
</tr>
.....
.....
</tbody>
</table>
What I am trying to achieve is collect these in arrays excluding the 4th td
of the tr
tag:
array(
array(
'art' => 'Floyd',
'thing' => 'machine',
'passion' => 'Banking',
),
array(
'art' => 'Nirvana',
'thing' => 'Paper',
'passion' => 'Business',
),
array(
'art' => 'The edge',
'thing' => 'Tree',
'passion' => 'Hospital',
),
);
This is what I have tried is:
require_once dirname( __FILE__ ) . '/library/simple_html_dom.php';
$html = file_get_html( 'https://www.example.com/list.html' );
$collect = array();
$list = $html->find( 'table tbody tr td' );
foreach( $list as $l ) {
$collect[] = $l->plaintext;
}
$html->clear();
unset($html);
print_r($collect);
Which is giving all the td
s in array and it's being difficult to identify the array keys which I require. Is there any solution for me?
Instead of iterating over all td
elements at once, you can iterate over each tr
and for each tr, iterate over inner td elements and skip the 4th td:
$htmlString =<<<html
<table>
<tbody>
<tr>
<td>Floyd</td>
<td>Machine</td>
<td>Banking</td>
<td>HelpScout</td>
</tr>
<tr>
<td>Nirvana</td>
<td>Paper</td>
<td>Business</td>
<td>GuitarTuna</td>
</tr>
<tr>
<td>The edge</td>
<td>Tree</td>
<td>Hospital</td>
<td>Sician</td>
</tr>
</tbody>
</table>
html;
$html = str_get_html($htmlString);
// find all tr tags
$trs = $html->find('table tr');
$collect = [];
// foreach tr tag, find its td children
foreach ($trs as $tr) {
$tds = $tr->find('td');
// collect first 3 children and skip the 4th
$collect []= [
'art' => $tds[0]->plaintext,
'thing' => $tds[1]->plaintext,
'passion' => $tds[2]->plaintext,
];
}
print_r($collect);
the output is:
Array
(
[0] => Array
(
[art] => Floyd
[thing] => Machine
[passion] => Banking
)
[1] => Array
(
[art] => Nirvana
[thing] => Paper
[passion] => Business
)
[2] => Array
(
[art] => The edge
[thing] => Tree
[passion] => Hospital
)
)