Search code examples
linuxcommand-linewgetmirror

How to mirror Wikipedia pages with wget linux command?


I want to mirror Wikipedia pages with wget Linux command I used this command

wget --mirror -p --convert-links -P ./folder-mirror /https://en.wikipedia.org/wiki/Portal:Contents/A–Z_index

but i only get this file robots.txt


Solution

  • Robot exclusion is on by default in wget to keep folks from being jerks and recursively gobbling up someone else's web page and their bandwidth with it.

    You can turn it off in your .wgetrc file, or you use wget's -e switch like: -e robots=off

    This isn't to say that Wikipedia doesn't have further safe guards in place to insure that your wget doesn't recursively download everything, but it will keep wget from honoring robots.txt and meta.

    If you still hit the wall, then perhaps tinkering with the user-agent or something along those lines.