Search code examples
javascriptlinuxdownloadwgetweb-testing

How to download a website where javascript code lookup results are included?


How to download a copy of a website in linux?

I have tried using wget --recursive --level=inf https://example.com however it also downloaded links from different domains.

Also is there a way to download a copy of the website where the javascript has run and resulted in output on the page. For example if downloading a weather website, there might be javascript which looks up the current temperature in the database and then renders the output. How to capture the temperature/final output?


Solution

  • Phantom.js?

    http://phantomjs.org/quick-start.html
    

    I think this will do what you like!

    The best thing to do is install from here:

    http://phantomjs.org/

    Basically you run it by creating javascript scripts and passing as a command line arg, e.g.

    phantomjs.exe someScript.js
    

    There are loads of examples, you can render a website as an image, for example you can do:

    phantomjs.exe github.js
    

    Where github.js looks like

    var page = require('webpage').create();
    page.open('http://github.com/', function() {
      page.render('github.png');
      phantom.exit();
    });
    

    This demo is at http://phantomjs.org/screen-capture.html

    You can also show the webpage content as text.

    For example, let's take a simple webpage, demo_page.html:

    <html>
        <head>
            <script>
            function setParagraphText() {
                document.getElementById("1").innerHTML = "42 is the answer.";
            }
            </script> 
        </head>
        <body onload="setParagraphText();">
            <p id="1">Static content</p>
        <body>
    </html>
    

    And then create a test script, test.js:

    var page = require('webpage').create();
    
    page.open("demo_page.html", function(status) {
        console.log("Status: " + status);
        if(status === "success") {
            console.log('Page text' + page.plainText);
            console.log('All done');        
        }
    phantom.exit();
    });
    

    Then in the console write:

    > phantomjs.exe test.js
    Status: success
    Page text: 42 is the answer.
    All done
    

    You can also inspect the page DOM and even update it:

    var page = require('webpage').create();
    
    page.open("demo_page.html", function(status) {
        console.log("Status: " + status);
        if(status === "success") {
            page.evaluate(function(){
                document.getElementById("1").innerHTML = "I updated the value myself";
            });
    
            console.log('Page text: ' + page.plainText);
            console.log('All done');
        }
        phantom.exit();
    });