Search code examples
htmllinuxdownloadjpegwget

wget: save with .jpg extension


I made this script to download .jpg files from a database:

for (( i = 1; i <= 9; i +=1))
do
wget http://archives.cg66.fr/mdr/index.php/docnumserv/getSubImage/0/0/0/-archives-009NUM_Etat_civil-Images---LLUPIA-2E1700_1702-FRAD066_2E1700_1702_000$i.jpg/0/100/0/100/100/100/100/100/2300/1500/0/100
done

because of the "/0/100/0/100/100..." after the .jpg extension, the result is:

9 files named: 100 , 100.1, 100.2, 100.3 ... 100.9

and I would find a way to have 9 .jpg files named 0001.jpg, 0002.jpg, 0003.jpg ... 0009.jpg

Could you give me some help or advice?


Solution

  • You coud try this way:

    ~$ URL1="http://archives.cg66.fr/mdr/index.php/docnumserv/getSubImage/0/0/0/-archives-009NUM_Etat_civil-Images---LLUPIA-2E1700_1702-FRAD066_2E1700_1702"
    ~$ URL2="0/100/0/100/100/100/100/100/2300/1500/0/100"
    ~$ for I in $(seq -w 0001 0009)
       do
          wget -O "${I}.jpg" "${URL1}_${I}.jpg/${URL2}"
       done
    

    To populate the i variable with three leading zeros I use seq -w 0001 0009. To download images with the right filename I use wget -O "${i}.jpg" ${URL}. This work also with more than 9 images, eg. to produce a sequence of numbers from 1 to 999 with the leading zeros (0001 ... 0099 ... 0999) the command becomes seq -w 0001 0999.

    See man seq and man wget for the documentation (online here and here).

    Of course the URL can't contain leading zeros between the variable ${i} and the underscore, otherwise the wget command will return an error page.

    For this reason I changed the URL from this: ..._1702_000$i.jpg/0/100/... to this: ..._1702_${i}.jpg/0/100/....

    The downloaded files:

    ~$ ls -l 
    total 20404
    -rw-r--r-- 1 ale ale 2408227 Oct  9 22:38 0001.jpg
    -rw-r--r-- 1 ale ale 2422199 Oct  9 22:38 0002.jpg
    -rw-r--r-- 1 ale ale 2330667 Oct  9 22:38 0003.jpg
    -rw-r--r-- 1 ale ale 2162542 Oct  9 22:38 0004.jpg
    -rw-r--r-- 1 ale ale 2579155 Oct  9 22:38 0005.jpg
    -rw-r--r-- 1 ale ale 2175118 Oct  9 22:38 0006.jpg
    -rw-r--r-- 1 ale ale 2174325 Oct  9 22:38 0007.jpg
    -rw-r--r-- 1 ale ale 2421311 Oct  9 22:38 0008.jpg
    -rw-r--r-- 1 ale ale 2202587 Oct  9 22:38 0009.jpg
    

    EDIT: Another alternative. First I create a file with list URL:

    ~$ URL1="http://archives.cg66.fr/mdr/index.php/docnumserv/getSubImage/0/0/0/-archives-009NUM_Etat_civil-Images---LLUPIA-2E1700_1702-FRAD066_2E1700_1702"
    ~$ URL2="0/100/0/100/100/100/100/100/2300/1500/0/100"
    ~$ for I in $(seq -w 0001 0009)
       do
          echo "${URL1}_{${I}}.jpg/${URL2}" >> url_list.txt
       done
    

    The loop outputs URLs formatted this way: ..._1702_{${I}}.jpg/0/100... in order to save files with the format: '#1.jpg'.

    ~$ xargs -P 10 -n 1 curl -o '#1.jpg' < url_list.txt
    

    However, this solution may overload the webserver. In case of troubles, I think might be helpful to use the wget solution adding the option --limit-rate=amount to limit download speed to amount bytes per seconds. Add k for kilobytes, M for megabytes.

    References: