I have such a fasta file :
>gnl|SRA|SRR035294.8571.2 FIHSSUW01ASCWS.2 length=224
GAGATGAAATAGATCTTGGCATATATGTACATGCTTGATCTCAGTTTTGATTGGATTTTATCCATTTTAG
CTATCTTAACTATTAATCTTGAAATGAAGCTTTAATTTATGTAGGAAGTTTATGAAATTTAGGAAAAAAA
AAGAAAAAAACAAAACAATGTCGGCCGCCTCGGTCTCTACTGAGACACGCAACAGGGGATAGGCAAGGCA
CACAGGGGATAGGN
>gnl|SRA|SRR035294.8572.2 FIHSSUW01ETZME.2 length=254
ACTAACCAGGTGGTAAACAACTACTACAGGCCAGATTTGAAGAAGGCTGCTCTTGCTAGATTGAGTGCAG
TGAACAGAAGCCTTAAGGTTTCAAAGTCTGGTGTGAAGAAGAAGAACAGACAGGCAGTTAGGATCCATGG
TAGGAAGTGAAGCTGTGATTTGCCTACCGTCTGATATTCATCGTATCACTTTCTAGCTGTTCCGTCTTGT
TTGGCAAGTGTTTGGTTTTACGTGCGAGTAGTTATATGTTGCGC
>gnl|SRA|SRR035294.8573.2 FIHSSUW01AZA99.2 length=230
AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGGATGTACCAATTCAAAAAGAAAACAGCAGTT
GGGGGCAAAACAATTAAGTTGTAACGAATGCATATATATGATTAATCTTCTAACACATTATTTTTGTCTC
AAAAAAAAAGAAAAAAAACAAAACATGTCGGCCGCCTCGGTCTCTACTGAGACACGCAACAGGGGATAGG
CAAGGCACACAGGGGATAGG
>gnl|SRA|SRR035294.8574.2 FIHSSUW01EHI3P.2 length=153
TGCAAGTTTACAACTTAAAACAACTTTTCTCACAGTGAACAATAAATTTATCAATTCTCATGCAAAAAAA
AAGAAAAAAACAAAAACATGTCGGCCGCCTCGGTCTCTACTGAGACACGCAACAGGGGATAGGCAAGGCA
CACAGGGGATAGG
>gnl|SRA|SRR035294.8575.2 FIHSSUW01EWK4S.2 length=287
AACAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGAGATTACAGGTATTGCAAGTTTCAAGCCTGTC
ATAAAGACTCAAAGCCGCTTGTAATTTGTGTTTCCTAGTTGGGGAAGCTGTTTGTTCTTTATTGTGCTAT
ATGTATTTATTTGAAAGTTTGGATGAACTCAATAAATAAAAGAAAATCTTCATTGTGGGTTACAATTTGG
ACATGAACATGCATGAATAATGTACCAATTTAGCAAAAAAAAAGAAAAAAACAAAAAACAAATAGTCGGC
CGGCCCG
>gnl|SRA|SRR035294.8576.2 FIHSSUW01C911A.2 length=265
TATTCTCAGGTACGAAATATGAGTTTGCTGATAAATTGATGGATTGGGAATCAGCCTGCATAATAAGATA
TTCCCAATTAACTTTGCCCGTTAGTTCTTTTAGCTTTTCCTTTAAAGGCACGAGTCTTTCAACCAAAACA
TTACAGCAAAGTCTAACTGCCTCACAGCTTGCTTCAGAAGTTGTACCCCCGGCCGTAATGGCCACTCTGC
GTTGATACCACTGCTTCTGAGACACGCAACAGGGGATAGGCAAGGCACACAGGGG
and i have wrote this script in bash
STRING=$1
FILE=$(pwd)"/"$2
if [ -z "$STRING" ]
then
echo "Usage: fastaFind.sh <query> <fasta file>"
else
echo ""
awk 'BEGIN { RS = ">" } ; $0 ~ "'$STRING'" { print $0 }' "$FILE"
fi
I am running this command
fastaFind.sh "gnl|SRA|SRR035294.8573.2 FIHSSUW01AZA99.2 length=230" file.fasta
but it returns me an error for unterminated string. What i want to achieve is to retrieve after the execution of the command the specific sequence of the query. e.g
>gnl|SRA|SRR035294.8573.2 FIHSSUW01AZA99.2 length=230
AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGGATGTACCAATTCAAAAAGAAAACAGCAGTT
GGGGGCAAAACAATTAAGTTGTAACGAATGCATATATATGATTAATCTTCTAACACATTATTTTTGTCTC
AAAAAAAAAGAAAAAAAACAAAACATGTCGGCCGCCTCGGTCTCTACTGAGACACGCAACAGGGGATAGG
Your awk
command would better be:
awk 'BEGIN{ ORS = ""; RS = ">"; FS="\n" } $1 == "pattern" { print ">" $0 }' file
Or
awk -v p="pattern" 'BEGIN {ORS = ""; RS = ">"; FS = "\n" } $1 == p { print ">" $0 }' file
And your shell script be:
#!/bin/bash
STRING=$1
FILE=$2
if [[ -z $STRING ]]; then
echo "Usage: fastaFind.sh <query> <fasta file>"
else
awk -v p="$STRING" 'BEGIN{ ORS=""; RS=">"; FS="\n" } $1 == p { print ">" $0 }' "$FILE"
fi
Example usage:
bash temp.sh 'gnl|SRA|SRR035294.8575.2 FIHSSUW01EWK4S.2 length=287' temp.txt
Output:
>gnl|SRA|SRR035294.8575.2 FIHSSUW01EWK4S.2 length=287
AACAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGAGATTACAGGTATTGCAAGTTTCAAGCCTGTC
ATAAAGACTCAAAGCCGCTTGTAATTTGTGTTTCCTAGTTGGGGAAGCTGTTTGTTCTTTATTGTGCTAT
ATGTATTTATTTGAAAGTTTGGATGAACTCAATAAATAAAAGAAAATCTTCATTGTGGGTTACAATTTGG
ACATGAACATGCATGAATAATGTACCAATTTAGCAAAAAAAAAGAAAAAAACAAAAAACAAATAGTCGGC
CGGCCCG