I am trying to run the programme Unicycler on multiple sets of fasta.gz files. A set of three fasta.gz files is required for each assembly, each set of three fasta.gz files has a common ID which are located in a unique sub-directory (containing the same corresponding common ID in the name).
For example, the three files: QC_141696.fastq.gz, QC_141696_1.fastq.gz, QC_141696_2.fastq.gz are required to run the assembly and are located in the subdirectory assem_141696. I have 10 more sets of 3 files organised in the same way; all 11 subdirectories with named assem_(ID) and are located in the parent directory Assemblies.
Sequencing/Assemblies/assem_(IDset1)/QC_(IDset1).fastq.gz
Sequencing/Assemblies/assem_(IDset1)/QC_(IDset1)_1.fastq.gz
Sequencing/Assemblies/assem_(IDset1)/QC_(IDset1)_2.fastq.gz
An example of the command I am trying to run, not within a loop is:
unicycler --short1 QC_141696.fastq.gz --short2 QC_141696_2.fastq.gz --long QC_141696.fastq.gz --out QC_141696_hybrid --threads 16
I want to loop through each of the assem_(IDset*) subdirectories and run Unicycler using the three files located within it, the output directory should be located in the relevant assem_(IDset*) subdirectory
This is the code that I have so far:
for file in Assemblies/assem*/*_1.fastq.gz;
do base=$(basename ${file} _1.fastq.gz)
echo "running unicycler hybrid assembly on ${base}"
unicycler --short1 ${base}_1.fastq.gz --short2 ${base}_2.fastq.gz --long ${base}.fastq.gz --out ${base}_hybridassem --threads 16
echo "unicycler assembly on ${base} finished"
done
I am running the code from within the Sequencing directory
But I get:
Error: could not find home/user/scratch/Sequencing/QC_181651_1.fastq.gz
So it seems that my code is not looping through the intended directories. Annoyingly it works fine when testing it with echo.
Any help would be greatly appreciated!
Your code running in the Sequencing
directory will need to build the paths to the input and output files for each of the assem_(IDset*) subdirectories. You can use bash parameter expansion dir=${file%\/*}
to extract the directory in your loop. (Note also that the base
variable was renamed to id
):
#!/bin/bash
for file in Assemblies/assem*/*_1.fastq.gz ; do
id=$(basename "${file}" _1.fastq.gz)
dir=${file%\/*}
echo "running unicycler hybrid assembly on ${id}"
unicycler --short1 "${dir}/${id}_1.fastq.gz" --short2 "${dir}/${id}_2.fastq.gz" --long "${dir}/${id}.fastq.gz" --out "${dir}/${id}_hybridassem" --threads 16
echo "unicycler assembly on ${id} finished"
done