I am trying to download files from a file (test.txt) containing links (over 15 000+).
I have this script:
#!/bin/bash
function download {
FILE=$1
while read line; do
url=$line
wget -nc -P ./images/ $url
#downloading images which are not in the test.txt,
#by guessing name: 12345_001.jpg, 12345_002.jpg..12345_005.jpg etc.
wget -nc -P ./images/ ${url%.jpg}_{001..005}.jpg
done < $FILE
}
#test.txt contains the URLs
split -l 1000 ./temp/test.txt ./temp/split
#read splitted files and pass to the download function
for f in ./temp/split*; do
download $f &
done
test.txt:
http://xy.com/12345.jpg
http://xy.com/33442.jpg
...
I am splitting the file into few pieces and daemonize (download $f &
) the wget process so it can jump to another splitted file containing the links.
Script is working, but the script does not exit at the end, I must press enter at the end. If I remove &
from the line download $f &
it works, but I loose the parallel downloading.
Edit:
As I found this is not the best way to parallelize wget downloads. It would be great to use GNU Parallel.
May I commend GNU Parallel to you?
parallel --dry-run -j32 -a URLs.txt 'wget -ncq -P ./images/ {}; wget -ncq -P ./images/ {.}_{001..005}.jpg'
I am only guessing what your input file looks like in URLs.txt
as something resembling:
http://somesite.com/image1.jpg
http://someothersite.com/someotherimage.jpg
Or, using your own approach with a function:
#/bin/bash
# define and export a function for "parallel" to call
doit(){
wget -ncq -P ./images/ "$1"
wget -ncq -P ./images/ "$2_{001..005}.jpg"
}
export -f doit
parallel --dry-run -j32 -a URLs.txt doit {} {.}