I'm using PHP Simple HTML Dom Parser to extract cell values content off an HTML table and store them in an array.
HTML:
<td class="inflexion">so<span class="deviation">y</span></td>
<td class="inflexion"><span class="deviation">fui</span></td>
<td class="inflexion"><span class="deviation">er</span>a</td>
<td class="inflexion">haber sería</td>
Desired output:
soy
fui
era
haber sería
PHP:
function getvariations($conjtables){
$conjtables = str_get_html($conjtables);
$variations = [];
foreach ($conjtables->find('td[class=inflexion]') as $inflexion) {
$variations[] = $inflexion->plaintext;
}
return array_unique($variations);
}
$variations = getvariations($conjtables);
foreach ($variations as $variation) {
echo $variation . '<br>';
}
This works, however, the output seems to prepend some occurrences of the span element with an undesired space (see third item below):
soy
fui
er a
haber sería
Any suggestions around fixing this? I cannot remove spaces arbitrarily because some cells happen to genuinely have multiple words as in the last item in the example given.
Use innertext
with strip_tags instead of plaintext
:
function getvariations($conjtables){
$conjtables = str_get_html($conjtables);
$variations = [];
foreach ($conjtables->find('td[class=inflexion]') as $inflexion) {
$variations[] = strip_tags($inflexion->innertext);
}
return array_unique($variations);
}
$variations = getvariations($conjtables);
foreach ($variations as $variation) {
echo $variation . '<br>';
}
Output:
soy
fui
era
haber sería