javascript python html css screen-scraping

Save an html page + change all links to point to the right place

You probably know that IE has this thing where you can save a web page, and it will automatically download the html file and all he image/css/js files that the html file uses.

Now there is one problem with this- the links in the html file are not changed. So if I download the html page of example.com, which has an < a href=/hi.html> the page that I downloaded with IE will have a link to C:\Documents and Settings...(path to the folder that the html file is in).

Is there a python library that will download an html page for me, with all the contents of it (images/js/css) too? If yes, is there a library that will also change the links for me?

Thanks!!

Solution

Since you're mentioning IE specifically, I'm not sure if this is gonna be of any use to you, but on linux the easiest way to completely mirror a website is with the wget command.

wget --mirror --convert-links -w 1 http://www.example.com

Run man wget if you need more options.