I have 100,000s of files that I wish to iterate the below sed command over:
sed -s -i -e 's/[[:space:]].*//' -e '1 s/^/>/g' -e '3 s/|*//g' -e '3 s/^/>ref/g' -e '1h;2H;1,2d;4G'
So far, I have been using a bash loop:
for i in read_* ; do
sed -s -i -e 's/[[:space:]].*//' -e '1 s/^/>/g' -e '3 s/|*//g' -e '3 s/^/>ref/g' -e '1h;2H;1,2d;4G' $i
mv $i $i.fasta
done
How can I use GNU Parallel to speed this up?
ls read_* > list.read.txt
parallel -j $cores -a list.read.txt sed -s -i -e 's/[[:space:]].*//' -e '1 s/^/>/g' -e '3 s/|*//g' -e '3 s/^/>ref/g' -e '1h;2H;1,2d;4G' []
I tried the above method where I create a list of files to iterate over and perform 10 jobs at once, however I get sed related error commands.
Try
parallel -q -v -j "$cores" -a list.read.txt sed -s -i -e 's/[[:space:]].*//' -e '1 s/^/>/g' -e '3 s/|*//g' -e '3 s/^/>ref/g' -e '1h;2H;1,2d;4G'
-q
option is necessary to quote special characters (spaces, >
, ...) in the command arguments.[]
was causing the code to break when I tested it, so I removed it. I don't know what it was supposed to do."$cores"
because variable expansions should almost always be quoted. See When to wrap quotes around a shell variable?. Use Shellcheck to find missing quotes, and many other shell code errors.