I have 8 files I would like to split into 5 chunks per file. I would normally do this individually but would like to run this as a loop. I work within a HPC.
I have created a list of the file names and labelled it "variantlist.txt". My code is:
for f in 'cat variantlist.txt'; do split ${f} -n 5 -d; done
However, it only splits the final file in the variantlist.txt file outputting 5 chunks from the final entry only.
Even if I list the files individually:
for f in chr001.vcf chr002 ...chr008.vcf ; do split ${f} -n 5 -d; done
It still only splits the final file into 5 chunks.
Not sure where I am going wrong here. The desired output would be 40 chunks, 5 per chromosome. Your help would be greatly appreciated.
Many thanks
The split is creating the same set of files each time and overwriting the previous ones. Here's one way to handle that -
for f in $(<variantlist.txt) # don't use cat
do mkdir -p $f.split # make a subdir for the files
( cd $f.split && # change into the subdir only in a subshell
split ../$f -n 5 -d # split from there
) # close the subshell, parent still in base dir
done
Or you could just do this -
while read f # grab each filename
do split $f -n 5 -d # split it
for x in x?? # for each split file
do mv $x $f.$x # rename it to include the parent file name
done
done < variantlist.txt # take names from this file
This is a lot slower, but doesn't use subdirs.
My favorite, though -
xargs -I {} split {} -n 5 -d {} < variantlist.txt
The last arg becomes the PREFIX for split
instead of the default of x
.
EDIT -- with 2 billion lines per file, use this one:
for f in $(<variantlist.txt)
do split "$f" -d -n 5 "$f" & # run all in background at the same time
done