Search code examples
phpjquerydomcurljquery-load

Pull HTML content from remote website and display on page


Been working on this for a little while now and am stumped. I am attempting to pull the content from within a specific div on a remote website page and then insert that html into a div on my own website. I know that you cannot solely use jQuery's .ajax, .load, or .get methods for this type of operation.

Here's the remote page's HTML:

<html>
    <body>
        <div class="entry-content">
            <table class="table">
                ...table #1 content...
                ...More table content...
            </table>
            <table class="table">
                ...table #2 content...
            </table>
            <table class="table">
                ...table #3 content...
            </table>
        </div>
    </body>
</html>

Goal: I am attempting to fetch the html from the remote page's first table. So, on my website, I would like the following html to be fetched and placed in a div of id="fetched-html":

<table class="table">
    ...table #1 content...
    ...More table content...
</table>

Here's where I'm at with my PHP function thus far:

<?php
function pullRaspi_SDImageTable() {
    $url = "http://www.raspberrypi.org/downloads";
    $curl = curl_init($url);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    $output = curl_exec($curl);
    curl_close($curl);

    // Create new PHP DOM document
    $DOM = new DOMDocument;
    // Load html from curl request into document model
    $DOM->loadHTML($output);

    // Get 1st table
    $output = $DOM->firstChild->getElementsByTagName('table');

    return $output;
}
?>

The final result should look like this on my local website page:

<div id="fetched-html">
    <table class="table">
        ...table #1 content...
        ...More table content...
    </table>
</div>

Here's another PHP function possibility?

<?php
function pullRaspPi_SDImageTable() {
    // Url to fetch
    $url = "http://www.raspberrypi.org/downloads";

    $ch = curl_init($url);
    $fp = fopen("raspberrypi_sdimagetable.txt", "w");
    curl_setopt($ch, CURLOPT_FILE, $fp);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);

    // Write html source to variable
    $rasp_sdimagetable = curl_exec($ch);

    // Close curl request
    curl_close($ch);

    return $rasp_sdimagetable;
}

// Then in the head of the html, add this jQuery:
<script type="text/javascript">
    $("#fetched-html").load("<?php pullRaspPi_SDImageTable(); ?> table.table:first");
</script>

Problem is, neither function works. :( Any thoughts?


Solution

  • Extracting a fragment of HTML from a website is a breeze with simplehtmldom you can then do something like:

    function pullRaspi_SDImageTable() {
        $filename = '/tmp/downloads.html';  /// Where you want to cache the result
        $expiry = 600;  // 10 minutes
        $output = '';
    
        if (!file_exists($filename) ||  time() - $expiry > filemtime($filename)) {
            // There is no cache, so fetch the results from remote server
            require_once('simple_html_dom.php');
            $html = file_get_html('http://www.raspberrypi.org/downloads');
            foreach($html->find('div.entry-content table.table') as $elem) {
                    $output .= (string)$elem;
            }
    
            // Store the cache
            file_put_contents($filename, $output);
        } else {
            // Pull the content from the cahce
            $output = file_get_contents($filename);
        }
    
        return $output;
    }
    

    Which will give you the table.table HTML