How to download a copy of a website in linux?
I have tried using wget --recursive --level=inf https://example.com
however it also downloaded links from different domains.
Also is there a way to download a copy of the website where the javascript has run and resulted in output on the page. For example if downloading a weather website, there might be javascript which looks up the current temperature in the database and then renders the output. How to capture the temperature/final output?
Phantom.js?
http://phantomjs.org/quick-start.html
I think this will do what you like!
The best thing to do is install from here:
Basically you run it by creating javascript scripts and passing as a command line arg, e.g.
phantomjs.exe someScript.js
There are loads of examples, you can render a website as an image, for example you can do:
phantomjs.exe github.js
Where github.js looks like
var page = require('webpage').create();
page.open('http://github.com/', function() {
page.render('github.png');
phantom.exit();
});
This demo is at http://phantomjs.org/screen-capture.html
You can also show the webpage content as text.
For example, let's take a simple webpage, demo_page.html:
<html>
<head>
<script>
function setParagraphText() {
document.getElementById("1").innerHTML = "42 is the answer.";
}
</script>
</head>
<body onload="setParagraphText();">
<p id="1">Static content</p>
<body>
</html>
And then create a test script, test.js:
var page = require('webpage').create();
page.open("demo_page.html", function(status) {
console.log("Status: " + status);
if(status === "success") {
console.log('Page text' + page.plainText);
console.log('All done');
}
phantom.exit();
});
Then in the console write:
> phantomjs.exe test.js
Status: success
Page text: 42 is the answer.
All done
You can also inspect the page DOM and even update it:
var page = require('webpage').create();
page.open("demo_page.html", function(status) {
console.log("Status: " + status);
if(status === "success") {
page.evaluate(function(){
document.getElementById("1").innerHTML = "I updated the value myself";
});
console.log('Page text: ' + page.plainText);
console.log('All done');
}
phantom.exit();
});