Search code examples
bashshellxargsgnu-parallel

How to run multiple curl requests in parallel with multiple variables


Set Up

I currently have the below script working to download files with curl, using a ref file with multiple variables. When I created the script it suited my needs however as the ref file has gotten larger and the data I am requesting via curl is takes longer to generate, my script is now taking too much time to complete.

Objective

I want to be able to update this script so I have curl request and download multiple files as they are ready - as opposed to waiting for each file to be requested and downloaded sequentially.

I've had a look around and seen that I could use either xargs or parallel to achieve this however based on the past questions I've seen, youtube videos and other forum posts, I have haven't been able to find an example that explains if this is possible using more than one variable.

Can someone confirm if this is possible and which tool is better suited to achieve this? Is my current script in the right configuration or do I need to amend a lot of it to shoe horn these commands in?

I suspect this may be a questions that's been asked previously and I may have just not found the right one.

account-list.tsv

client1 account1    123 platform1   50
client2 account1    234 platform1   66
client3 account1    344 platform1   78
client3 account2    321 platform1   209
client3 account2    321 platform2   342
client4 account1    505 platform1   69

download.sh

#!/bin/bash
set -eu

user="user"
pwd="pwd"
D1=$(date "+%Y-%m-%d" -d "1 days ago")
D2=$(date "+%Y-%m-%d" -d "1 days ago")
curr=$D2
cheese=$(pwd)

curl -o /dev/null -s -S -L -f -c cookiejar 'https://url/auth/' -d name=$user -d passwd=$pwd

while true; do

        while IFS=$'    ' read -r client account accountid platform platformid
        do
                curl -o /dev/null -s -S -f -b cookiejar -c cookiejar 'https://url/auth/' -d account=$accountid
                curl -sSfL -o "$client€$account@$platform£$curr.xlsx" -J -b cookiejar -c cookiejar "https://url/platform=$platformid&date=$curr"
        done < account-list.tsv

        [ "$curr" \< "$D1" ] || break
        curr=$( date +%Y-%m-%d --date "$curr +1 day" ) ## used in instances where I need to grade data for past date ranges.

done

exit

Solution

  • Using GNU Parallel it looks something like this to fetch 100 entries in parallel:

    #!/bin/bash
    set -eu
    
    user="user"
    pwd="pwd"
    D1=$(date "+%Y-%m-%d" -d "1 days ago")
    D2=$(date "+%Y-%m-%d" -d "1 days ago")
    curr=$D2
    cheese=$(pwd)
    
    curl -o /dev/null -s -S -L -f -c cookiejar 'https://url/auth/' -d name=$user -d passwd=$pwd
    
    fetch_one() {
        client="$1"
        account="$2"
        accountid="$3"
        platform="$4"
        platformid="$5"
    
        curl -o /dev/null -s -S -f -b cookiejar -c cookiejar 'https://url/auth/' -d account=$accountid
        curl -sSfL -o "$client€$account@$platform£$curr.xlsx" -J -b cookiejar -c cookiejar "https://url/platform=$platformid&date=$curr"
    }
    export -f fetch_one
    
    while true; do
        cat account-list.tsv | parallel -j100 --colsep '\t' fetch_one
        [ "$curr" \< "$D1" ] || break
        curr=$( date +%Y-%m-%d --date "$curr +1 day" ) ## used in instances where I need to grade data for past date ranges.
    done
    
    exit