I am trying to use parallel
in a bash script, to verify if s3 path exists or not and I am trying to verify multiple s3 paths, by counting the objects in the path. If the count of the object is zero it will continue to the next date in the for loop, with parallel it is not working as expected.
For Date range I provided in the for
loop, we actually don't have those folders in the s3bucket, and in the function checkS3Path
if s3 path doesnt exists, I am creating a 0KB file, but I dont see those 0KB files being created after script is executed. From the output of the script, I am seeing S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-03
, instead of S3 Path Doesnt Exists folder1:+2019-10-03
. Please see the output below.
please let me what might be the issue.
Here is the sample code.
#!/bin/bash
#set -x
s3Bucket=testbucket
version=v20
Array=(folder1 folder2 folder3)
checkS3Path() {
fldName=$1
date=$2
objectNum=$(aws s3 ls s3://${s3Bucket}/${version}/${fldName}/date=${date}/ | wc -l)
echo $objectNum
if [ "$objectNum" -eq 0 ]
then
echo "S3 Path Doesnt Exists ${fldName}:${date}" >> /app/${fldName}.log
touch /home/ubuntu/${fldName}_${date}.txt
continue
else
echo "S3 Path Consists csv Files, Proceeding to next step ${fldName}:${date}"
fi
}
final() {
fldName=$1
date=$2
checkS3Path $fldName $date
function2 $fldName $date
function3 $fldName $date
}
export -f final checkS3Path
for date in 2019-10-{01..03}
do
# finalstep folder1 $date
parallel --jobs 4 --eta finalstep ::: "${Array[@]}" ::: +"$date"
done
Here is the output I am seeing.
$ ./test.sh
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence this citation notice: run 'parallel --citation'.
Computers / CPU cores / Max jobs to run
1:local / 4 / 4
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s local:4/0/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-01
ETA: 0s Left: 13 AVG: 0.00s local:4/1/100%/2.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder2:+2019-10-01
ETA: 0s Left: 12 AVG: 0.00s local:4/2/100%/1.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder3:+2019-10-01
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence this citation notice: run 'parallel --citation'.
Computers / CPU cores / Max jobs to run
1:local / 4 / 4
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s local:4/0/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-02
ETA: 0s Left: 13 AVG: 0.00s local:4/1/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder2:+2019-10-02
ETA: 6s Left: 12 AVG: 0.50s local:4/2/100%/0.5s 202
S3 Path Consists CSV Files, Proceeding to next step folder3:+2019-10-02
ETA: 3s Left: 11 AVG: 0.33s local:4/3/100%/0.3s 202
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence this citation notice: run 'parallel --citation'.
Computers / CPU cores / Max jobs to run
1:local / 4 / 4
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s local:4/0/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-03
ETA: 0s Left: 13 AVG: 0.00s local:4/1/100%/1.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder2:+2019-10-03
ETA: 0s Left: 12 AVG: 0.00s local:4/2/100%/0.5s 202
S3 Path Consists CSV Files, Proceeding to next step folder3:+2019-10-03
ETA: 0s Left: 11 AVG: 0.00s local:4/3/100%/0.3s 202
$
Thanks
If checkS3Path
works when run by hand, then you probably just need to:
export s3Bucket=testbucket
export version=v20
Each GNU Parallel job runs in its own shell (started from Perl) which is the reason you need to export variables, if you want them to be visible to the job.
Also look at env_parallel
to do this automatically.