I created this little Bash script that has one argument (a filename) and the script is supposed to respond according to the extension of the file:
#!/bin/bash
fileFormat=${1}
if [[ ${fileFormat} =~ [Ff][Aa]?[Ss]?[Tt]?[Qq]\.?[[:alnum:]]+$ ]]; then
echo "its a FASTQ file";
elif [[ ${fileFormat} =~ [Ss][Aa][Mm] ]]; then
echo "its a SAM file";
else
echo "its not fasta nor sam";
fi
It's ran like this:
sh script.sh filename.sam
If it's a fastq (or FASTQ, or fq, or FQ, or fastq.gz (compressed)) I want the script to tell me "it's a fastq". If it's a sam, I want it to tell me it's a sam, and if not, I want to tell me it's neither sam or fastq.
THE PROBLEM: when I didn't consider the .gz (compressed) scenario, the script ran well and gave the result I expected, but something is happening when I try to add that last part to account for that situation (see third line, the part where it says .?[[:alnum:]]+ ). This part is meant to say "in the filename, after the extension (fastq in this case), there might be a dot plus some word afterwards".
My input is this:
sh script.sh filename.fastq.gz
And it works. But if I put: sh script.sh filename.fastq
It says it's not fastq. I wanted to put that last part as optional, but if I add a "?" at the end it doesn't work. Any thoughts? Thanks! My question would be to fix that part in order to work for both cases.
You may use this regex:
fileFormat="$1"
if [[ $fileFormat =~ [Ff]([Aa][Ss][Tt])?[Qq](\.[[:alnum:]]+)?$ ]]; then
echo "its a FASTQ file"
elif [[ $fileFormat =~ [Ss][Aa][Mm]$ ]]; then
echo "its a SAM file"
else
echo "its not fasta nor sam"
fi
Here (\.[[:alnum:]]+)?
makes last group optional which is dot followed by 1+ alphanumeric characters.
When you run it as:
./script.sh filename.fastq
its a FASTQ file
./script.sh fq
its a FASTQ file
./script.sh filename.fastq.gz
its a FASTQ file
./script.sh filename.sam
its a SAM file
./script.sh filename.txt
its not fasta nor sam