I have a tab separated text file with one column of file paths, e.g. table.txt
> SampleID Factor Condition Replicate Treatment Type Dataset isPE ReadLength isREF PathFASTQ
> DG13 fd3 c1 1 cc 0 0102 0 50 1 "/path/to/fastq"
> DG14 fd3 c1 1 cc 1 0102 0 50 1 "/path/to/fastq"
I would like to store the paths in a bash array so I can use these in a downstream parallel computation (SGE Task Arrays). For simplicity, the leading and trailing "
can easily be not included in table.txt
.
Excluding the header line, I tried the following:
files=($(awk '{ if(($8 == 0)) { print $1} }' table.txt ))
paths=($(awk '{ if(($8 == 0)) { print $11} }' table.txt ))
infile="${paths[$SGE_TASK_ID]}"/"${files[$SGE_TASK_ID]}".fastq.gz
$SGE_TASK_ID
takes a user-defined integer value between (1-N) in case someone does not know.
Unfortunately $infile
does not show the expected value for $SGE_TASK_ID=1
:
/path/to/fastq/DG13.fastq.gz
Thanks for your help.
Could you please try following, this code will remove Control M characters during run of the code.
myarr=($(awk '{gsub(/\r/,"")} match($NF,/\/[^"]*/){\
val=substr($NF,RSTART,RLENGTH);\
num=split(val,array,"/");\
print val"/"$1"."array[num]".gz"}' Input_file))
for i in "${myarr[@]}"
do
echo $i
done
In case you want to remove control M characters from your Input_file itself then try running following too:
tr -d '\r' < Input_file > temp && mv temp Input_file
When we print array with loop as above shown, output will be as follows.
/path/to/fastq/DG13.fastq.gz
/path/to/fastq/DG14.fastq.gz
Explanation of awk
code:
awk ' ##Starting awk program from here.
match($NF,/\/[^"]*/){ ##Using match function of awk program here, match everything till " in last field.
val=substr($NF,RSTART,RLENGTH) ##Creating variable val which is sub-string where starting point is RSTART till value of RLENGTH.
num=split(val,array,"/") ##Creating variable num whose value is number of elements plitted by split, splitting val into array with / is delimiter.
print val"/"$1"."array[num]".gz" ##Printing val / first field DOT array last element then .gz here.
}
' Input_file ##Mentioning Input_file name here.