I'm following the HTTrack docs example here: http://httrack.kauler.com/help/User-defined_structure
The site I need to scrape has URLs in this structure:
https://www.example.com/index.php?HelpTopics
https://www.example.com/index.php?MoreHelp
etc.
With HTTrack, I want to download the site save the files in the format
HelpTopics.html
MoreHelp.html
etc.
I'm using this on the command line modified from the docs linked above:
httrack "https://www.example.com" %n%[index.php?:-:::].%t
but I still get all files saved as index2b26.html
and index2de7.html
etc.
What am I doing wrong with the HTTrack options? Is this breaking because there are no file extensions on the original site example.com
?
I found it's much easier to use wget
to save file with their original names. This does it:
wget --mirror -p --convert-links --content-disposition --trust-server-names -P examplefolder http://www.example.com