Search code examples
phpdomsimple-html-dom

change / replace specific part of url inside a parsed table


I want to parse a table via simple_html_dom. So far so good. Now I want to change all links inside the table cells. They currently end with ".htm" and should be changed to ".php", so the links direct to the same filename, but another filetype. Since the content of the source file is continuously changing, it must work independently of the file name.

Example:

<td><a href="www.website.com/site1.htm" ... --> <td><a href="www.website.com/site1.php"

This is current code:

// Download simple_html_dom.php first from http://simplehtmldom.sourceforge.net/
require_once('simple_html_dom.php');
// Get the contents of the HTML document either using cURL, a crawling
// framework, or use the provided file_get_html() function.
$html = file_get_html('mywebsite/example.htm');


// Table 1
    $table = $html->find('table', 1);
    $rowData = array();

    foreach($table->find('tr') as $row) {
        // initialize array to store the cell data from each row
        $flight = array();
        foreach($row->find('td') as $cell) {
            // push the cell's text to the array
            $flight[] = $cell->innertext;
        }
        foreach($row->find('th') as $cell) {
            // push the cell's text to the array
            $flight[] = $cell->innertext;
        }
        $rowData[] = $flight;
    }
    foreach ($rowData as $row => $tr) {
        echo '<tr>';
        foreach ($tr as $td)
            echo '<td>' . $td .'</td>';
        echo '</tr>';
    }

The source looks like:

    table><hr>
<tr><th>po</th><th>player</th><th>age</th><th>2ga</th><th>2g%</th><th>fta</th><th>ft%</th><th>3ga</th><th>3g%</th><th>orb</th><th>drb</th><th>ast</th><th>stl</th><th>to</th><th>blk</th><th>o-o</th><th>d-o</th><th>p-o</th><th>t-o</th><th>o-d</th><th>d-d</th><th>p-d</th><th>t-d</th></tr>
<tr><td CLASS=tdp>PG</td><td CLASS=tdp><a href="JamesHarden7.htm">James Harden                    </a></td><td>27</td><td>48</td><td>53</td><td>95</td><td>85</td><td>85</td><td>35</td><td>20</td><td>59</td><td>99</td><td>57</td><td>1</td><td>12</td><td>4</td><td>9</td><td>7</td><td>9</td><td>8</td><td>6</td><td>5</td><td>7</td></tr>
<tr><td CLASS=tdp>PG</td><td CLASS=tdp><a href="TerryRozier1.htm">Terry Rozier                    </a></td><td>22</td><td>31</td><td>41</td><td>15</td><td>77</td><td>43</td><td>32</td><td>18</td><td>42</td><td>31</td><td>46</td><td>79</td><td>8</td><td>5</td><td>4</td><td>4</td><td>2</td><td>6</td><td>5</td><td>4</td><td>6</td></tr>
<tr><td CLASS=tdp>SG</td><td CLASS=tdp><a href="DannyGreen6.htm">Danny Green  

and so on...

Solution

  • You could use find("td a") to get the anchors for your example.

    Then you could use a foreach to loop over the results and replace the last 3 characters of the href property with php using for example substr_replace

    $html = <<<HTML
     <table><hr>
    <tr><th>po</th><th>player</th><th>age</th><th>2ga</th><th>2g%</th><th>fta</th><th>ft%</th><th>3ga</th><th>3g%</th><th>orb</th><th>drb</th><th>ast</th><th>stl</th><th>to</th><th>blk</th><th>o-o</th><th>d-o</th><th>p-o</th><th>t-o</th><th>o-d</th><th>d-d</th><th>p-d</th><th>t-d</th></tr>
    <tr><td CLASS=tdp>PG</td><td CLASS=tdp><a href="JamesHarden7.htm">James Harden                    </a></td><td>27</td><td>48</td><td>53</td><td>95</td><td>85</td><td>85</td><td>35</td><td>20</td><td>59</td><td>99</td><td>57</td><td>1</td><td>12</td><td>4</td><td>9</td><td>7</td><td>9</td><td>8</td><td>6</td><td>5</td><td>7</td></tr>
    <tr><td CLASS=tdp>PG</td><td CLASS=tdp><a href="TerryRozier1.htm">Terry Rozier                    </a></td><td>22</td><td>31</td><td>41</td><td>15</td><td>77</td><td>43</td><td>32</td><td>18</td><td>42</td><td>31</td><td>46</td><td>79</td><td>8</td><td>5</td><td>4</td><td>4</td><td>2</td><td>6</td><td>5</td><td>4</td><td>6</td></tr>
    </table>
    HTML;
    
    $html = str_get_html($html);
    
    foreach ($html->find("td a") as $a) {
        $a->href = substr_replace($a->href, 'php', -3);
    }