I need to download files from several pages, using wget -r -l 1 -nd -H --accept-regex 'https://blogspot.com/s[0-9]{4}/[0-9]{3}.pdf' -i list.txt
; in the TXT file I have a list of all the pages from which I need to download, one per line, like
and so on.
I'm trying to create different folders for each source, so that all the files downloaded from https://blogspot.com/test/001
are in a folder named 001, all those from https://blogspot.com/test/002
are in a folder 002, and so on.
How could I do that?
You might use -P
to instruct GNU wget
to store download e.g.
wget -P examplepage -np -r -l 1 http://www.example.com
will store what it download inside examplepage directory. Said directory will be created if it does not exists yet.
I'm trying to create different folders for each source, so that all the files downloaded from
https://blogspot.com/test/001
are in a folder named 001, all those fromhttps://blogspot.com/test/002
are in a folder 002, and so on.
I do not know if it possible with single wget
call. You might use loop to process file line by line, for example let say urls.txt
content is
http://www.example.com?page=001
http://www.example.com?page=002
http://www.example.com?page=003
and I wish 1st to into directory named 001, 2nd into directory named 002, 3rd into directory named 003 I could do that by
while read line; do
dirname=$(echo "$line" | sed 's/.*page=//')
wget -P "$dirname" "$line"
done < urls.txt
Explanation: I use while loop to process file named urls.txt
line by line, I use GNU sed
to prepare directory name by removing everything up to page=
from url.