Search code examples
bashloopsshellunixsamtools

Using SAMtools and storing outputs for a large number of files


I've got 500+ files that I need to change from .bam to .sam so am trying to use samtools. I've done some looking on here and found this answer (Changing file paths outputs within a loop, in a shell script) and modified it to fit my work:

input_files="/scratch/spectre/h/homeTCGA_Data/*.bam"
output_files="/scratch/spectre/h/home/Data_Sam"
for i in $input_files 
do
tmp=$(i/scratch/spectre/h/home/Data_Sam) 
samtools view -h $i > $(tmp/.bam/.sam)
done

I'm a complete novice at this so I'm assuming that I've made an obvious mistake somewhere. The error I'm getting with this is that the directory 'Data_Sam' doesn't exist along with 'ambiguous redirect'. I've checked and it definitely does exist in scratch. I've also tried this in case I'm way over complicating things:

 for i in `ls ${/scratch/spectre/h/home/Data/}/*.bam`
 do
 samtools view -h <$i >${/scratch/spectre/h/home/Data_Sam}/$i.sam
 done

For this I'm getting the error 'bad substitution' for ${/scratch/spectre/h/home/Data/}/*.bam`.

I've also tried the following and get the error 'bad substitution':

for i in "ls ${/scratch/spectre/h/home/Data/}/*.bam";
do filename "${i%%.*}";
samtools view $i ${filename}.sam;
done

Is there any way that I can loop over my 500+ files, change them from bam to sam and store them somewhere new?


Solution

  • In your first attempt, the syntax you are looking for string substitution is ${parameter/pattern/string} where parameter gets expanded and in the result the longest match of pattern gets substituted with string. Note that this syntax uses curly braces, not parentheses (see Shell Parameter Expansion in the Bash manual).

    I think this is what you were trying to do:

    input_files="/scratch/spectre/h/home/Data/*.bam"
    for i in $input_files 
    do
        tmp=${i/Data/Data_Sam}                 # replace 'Data' with 'Data_Sam'
        samtools view -h $i > ${tmp/.bam/.sam} # replace '.bam' with '.sam'
    done
    

    With small changes you could easily make it more robust against file names containing spaces or containing ".bam" somewhere in the middle:

    input_files="/scratch/spectre/h/home/Data/*.bam"
    for i in $input_files 
    do
        tmp="${i/Data/Data_Sam}"
        samtools view -h "$i" > "${tmp/%.bam/.sam}"
    done