Search code examples
phphtmldomxpath

PHP Getting Text and Href from HTML page using XPATH


This is my first question on Stack ever, so forgive me if something is improper.

I got a webpage where there is a list of information i like to extract, there is also a in one of the td's but i can't wrap my head around how to get access to it.

Example HTML:

<tbody>
  <tr>
   <td>
     19-10-2020 @ 17:33
   </td>
   <td class="hidden-xs hidden-sm">
    <a href="#" data-identifier="5f8db1c332ea9b22d375b7c0"></a>                                       
   </td>

Here is the example PHP i used to extract the other TD's

$xpath = new DOMXpath($document);
        
        foreach($xpath->evaluate('//table/tbody/tr') as $tr) {
    

enter code here

        
            $i = 0;
            $row = [];
            foreach ($xpath->evaluate('td', $tr) as $td) {
                if ($i == 0){
                    $row['datumtijd'] = date_format(date_create(str_replace(" @", "",trim($td->nodeValue))),"Y-m-d H:i:s");
                }
                if ($i == 1){
                  print_r($td->nodeValue); //Completely empty
                }

Any help is really appreciated.


Solution

  • Focusing only on extracting the data (and not on formatting, etc.) and assuming your html is fixed like below, try something along the lines of:

     $str = '
    <tbody>
      <tr>
       <td>
         19-10-2020 @ 17:33
       </td>
       <td class="hidden-xs hidden-sm">
        <a href="#" data-identifier="5f8db1c332ea9b22d375b7c0"></a>                                       
       </td>
      </tr>
    </tbody>
    ';
    $doc = new DOMDocument();
    $doc->loadHTML($str);
    $doc = simplexml_import_dom($doc);
    $dates = $doc->xpath('//td[1]');
    $identifiers = $doc->xpath('//td/a[@href]/@data-identifier');
    
    foreach(array_combine($dates, $identifiers) as $date => $identifier) {
        echo trim($date) . "\n";
        echo trim($identifier) . "\n";
    }
    

    Output:

    19-10-2020 @ 17:33
    5f8db1c332ea9b22d375b7c0