Search code examples
regexrbashshrscript

How to match the regex for the below pattern?


I am trying to write a script which should work out like this below but somehow am not able to get the write way to put the syntax.

I have folders like S_12_O_319_K4me1.

While the contents are S_12_O_319_K4me1_S12816.sorted.bam in each folder.

So I wanted to write a script where my my script goes into my folder of the same name in a loop and then identifies the *.bam file and perform the operation, but I am unable to put the regex. This is what I tried:

#!/bin/bash
#$ -S /bin/bash

spp_run=/path/phantompeakqualtools/run_spp.R
bam_loc=/path/ChIP-Seq/output

samples="S_12_O_319_K27me3
S_12_O_319_K4me1
S_12_O_319_K4me3
S_12_O_319_K27ac"

for s in $samples; do

    echo "Running SPP on $s ..."
    Rscript $spp_run -c=$bam_loc/$s/${s}_S[[0-9]+\.sorted.bam -savp -out=$bam_loc/$s/${s}".run_spp.out"
done

I am not being able to recognize the digits with the above regex match.

Where am I getting it wrong?

Edit: I tried below still it does not work, problem with parsing in the Rscript, but why will this be a problem

#!/bin/bash
#$ -S /bin/bash

spp_run=/path/tools/phantompeakqualtools/run_spp.R
bam_loc=/path/ChIP-Seq/output

samples="S_12_O_319_K27me3
S_12_O_319_K4me1
S_12_O_319_K4me3"

for s in $samples; do
    echo "Running SPP on $s ..."
    echo $bam_loc/$s/${s}_S*.sorted.bam
    inbam=$bam_loc/$s/${s}_S*.sorted.bam
    echo $inbam
    Rscript $spp_run -c=$inbam -savp -out=$bam_loc/$s/${s}".run_spp.out"
done
echo "done"

Error

Error in parse.arguments(args) :
  ChIP File:/path/ChIP-Seq/output/S_12_O_319_K27me3/S_12_O_319_K27me3_S*.sorted.bam does not exist
Execution halted

Does not recognize the file even though $inbam value is /path/ChIP-Seq/output/S_12_O_319_K27me3/S_12_O_319_K27me3_S12815.sorted.bam


Solution

  • I found answer to my query and below is the code. Not an elegant one but it works. I realized that the Rscript requires full name and full path so I just initialized the output of the echo command to a variable and passed it to the Rscript as input file argument and it gets a full path with full filename so now it recognizes the input file.

    Not an elegant way but still it works for me.

    #!/bin/bash
    #$ -S /bin/bash
    
    spp_run=/path/tools/phantompeakqualtools/run_spp.R
    bam_loc=/path/ChIP-Seq/output
    
    samples="S_12_O_319_K27me3
    S_12_O_319_K4me1
    S_12_O_319_K4me3"
    
    for s in $samples; do
        echo "Running SPP on $s ..."
        echo $bam_loc/$s/${s}_S*.sorted.bam
        inbam=$bam_loc/$s/${s}_S*.sorted.bam
        echo $inbam
        infile=`echo $inbam`
        Rscript $spp_run -c=$infile -savp -out=$bam_loc/$s/${s}".run_spp.out"
    done
    echo "done"
    

    Thanks everyone for the suggestions and comments. My code is not elegant but it is working so I put the answer here.