Search code examples
perlwgetmirror

How do I completely mirror a web page?


I have several web pages on several different sites that I want to mirror completely. This means that I will need images, CSS, etc, and the links need to be converted. This functionality would be similar to using Firefox to "Save Page As" and selecting "Web Page, complete". I'd like to name the files and corresponding directories as something sensible (e.g. myfavpage1.html,myfavpage1.dir).

I do not have access to the servers, and they are not my pages. Here is one sample link: Click Me!

A little more clarification... I have about 100 pages that I want to mirror (many from slow servers), I will be cron'ing the job on Solaris 10 and dumping the results every hour to a samba mount for people to view. And, yes, I have obviously tried wget with several different flags but I haven't gotten the results for which I am looking. So, pointing to the GNU wget page is not really helpful. Let me start with where I am with a simple example.

 wget --mirror -w 2 -p --html-extension --tries=3 -k -P stackperl.html "https://stackoverflow.com/tags/perl"

From this, I should see the https://stackoverflow.com/tags/perl page in the stackper.html file, if I had the flags correct.


Solution

  • If your just looking to run a command and get a copy of a web site, use the tools that others have suggested, such as wget, curl, or some of the GUI tools. I use my own personal tool that I call webreaper (that's not the Windows WebReaper though. There are a few Perl programs I know about, including webmirror and a few others you can find on CPAN.

    If you're looking to do this inside a Perl program you are writing (since you have the "perl" tag on your answer), there are many tools in CPAN that can help you at each step:

    Good luck, :)