Search code examples
wgetlftp

Tool to download files (including files without direct link) from website?


I have been trying to find a solution to download files from URL such as: https://.com//. I learned about wget and tried quite many options, but realized it does not download any files that does not have direct link in index file or any sort.

For example, I'd like to download everything from https://somesites.com/myfolder/myfiles/.
Let's say there is an index.html under "myfiles" directory, and many html files and couple directories that are all referenced and linked in index but also couple of other html files such as sample123.html and sample456.html.

wget command successfully downloads all, but sample123.html and sample456.html with pretty much most of the common and well known options.

Is there any other tools that will grab ALL files that are located in https://somesites.com/myfolder/myfiles/ regardless with or without direct link?

I also tried lftp against the http URL, but download result was much fewer files that wget.

I looked through stack overflow for this, but recommended commands are the ones that only downloads files with direct link (by wget).


Solution

  • What you want to do is not possible and could be a security problem. Imagine that someone has, for example, a file with some sensitive data inside the folder and that file is not listed anywhere. You are asking for a tool that would also download that file.

    So as said, is not possible, that's why it's always a good suggestion to disable directory listing in HTTP servers as a security option, to prevent exactly what you want to do.