Search code examples
phpsimple-html-dom

Removing unwanted elements from table simple_html_dom


I am fetching a page that is a page with some style tags, table and other non vital content. I'm storing this in a transient, and fetching it all with AJAX

$result_match = file_get_contents( 'www.example.com' );

set_transient( 'match_results_details', $result_match, 60 * 60 * 12 );

$match_results = get_transient( 'match_results_details' );

if ( $match_results != '') {

    $html = new simple_html_dom();
    $html->load($match_results);

    $out = '';

    $out .= '<div class="match_info_container">';
    if (!empty($html) && is_object($html)) {
        foreach ($html->find('table') as $table => $table_value) {
            $out .= preg_replace('/href="?([^">]+)"/', '', $table_value);
        }
    }
    $out .= '</div>';

    wp_die ( $out );

} else {
    $no_match_info = esc_html__('No info available', 'kompisligan');
    wp_die($no_match_info);
}

Now the table had anchors and I needed to remove that, so I used preg_replace to find any anchor and empty it out. I know that you can manipulate the contents with find() method, but I had no success with that.

Now I would like to get rid of the entire <tfoot> tag, and what it contains.

But every time I try to 'find' something, the ajax returns error, meaning that something in my code is wrong.

How do I manipulate contents of already found element with simple_html_dom? I tried outputting the contents of $html so that I can see what I'll get out but my AJAX call lasts forever and I cannot get it out.


Solution

  • You could try this, using builtin DOMDocument instead of simple_html_dom. However, if your Ajax call is timing out, it might be a different problem (not being able to load example.com or so).

    if ( $match_results != '') {
    
        $html = new DOMDocument();
        // Suppress errors
        @$html->loadHTML($match_results);
    
        $out = '<div class="match_info_container">';
    
        // Remove all "href" tags from <a>
        foreach($html->getElementsByTagName('a') as $href)
            $href->setAttribute('href', '');
    
        // Remove Tfoot
        foreach($html->getElementsByTagName('tfoot') as $tfoot) 
            $tfoot->parentNode->removeChild($tfoot);
    
        // Put the contents of every <table> in the div.
        foreach($html->getElementsByTagName('table') as $table)
            $out .= $table->nodeValue;
    
    
        $out .= '</div>';
    
    
    
    
        wp_die ( $out );
    
    } else {