I currently have the below script working to download files with curl
, using a ref file with multiple variables. When I created the script it suited my needs however as the ref file has gotten larger and the data I am requesting via curl
is takes longer to generate, my script is now taking too much time to complete.
I want to be able to update this script so I have curl
request and download multiple files as they are ready - as opposed to waiting for each file to be requested and downloaded sequentially.
I've had a look around and seen that I could use either xargs
or parallel
to achieve this however based on the past questions I've seen, youtube videos and other forum posts, I have haven't been able to find an example that explains if this is possible using more than one variable.
Can someone confirm if this is possible and which tool is better suited to achieve this? Is my current script in the right configuration or do I need to amend a lot of it to shoe horn these commands in?
I suspect this may be a questions that's been asked previously and I may have just not found the right one.
client1 account1 123 platform1 50
client2 account1 234 platform1 66
client3 account1 344 platform1 78
client3 account2 321 platform1 209
client3 account2 321 platform2 342
client4 account1 505 platform1 69
#!/bin/bash
set -eu
user="user"
pwd="pwd"
D1=$(date "+%Y-%m-%d" -d "1 days ago")
D2=$(date "+%Y-%m-%d" -d "1 days ago")
curr=$D2
cheese=$(pwd)
curl -o /dev/null -s -S -L -f -c cookiejar 'https://url/auth/' -d name=$user -d passwd=$pwd
while true; do
while IFS=$' ' read -r client account accountid platform platformid
do
curl -o /dev/null -s -S -f -b cookiejar -c cookiejar 'https://url/auth/' -d account=$accountid
curl -sSfL -o "$client€$account@$platform£$curr.xlsx" -J -b cookiejar -c cookiejar "https://url/platform=$platformid&date=$curr"
done < account-list.tsv
[ "$curr" \< "$D1" ] || break
curr=$( date +%Y-%m-%d --date "$curr +1 day" ) ## used in instances where I need to grade data for past date ranges.
done
exit
Using GNU Parallel it looks something like this to fetch 100 entries in parallel:
#!/bin/bash
set -eu
user="user"
pwd="pwd"
D1=$(date "+%Y-%m-%d" -d "1 days ago")
D2=$(date "+%Y-%m-%d" -d "1 days ago")
curr=$D2
cheese=$(pwd)
curl -o /dev/null -s -S -L -f -c cookiejar 'https://url/auth/' -d name=$user -d passwd=$pwd
fetch_one() {
client="$1"
account="$2"
accountid="$3"
platform="$4"
platformid="$5"
curl -o /dev/null -s -S -f -b cookiejar -c cookiejar 'https://url/auth/' -d account=$accountid
curl -sSfL -o "$client€$account@$platform£$curr.xlsx" -J -b cookiejar -c cookiejar "https://url/platform=$platformid&date=$curr"
}
export -f fetch_one
while true; do
cat account-list.tsv | parallel -j100 --colsep '\t' fetch_one
[ "$curr" \< "$D1" ] || break
curr=$( date +%Y-%m-%d --date "$curr +1 day" ) ## used in instances where I need to grade data for past date ranges.
done
exit