Search code examples
annotationsextract

Extract a specific gene from several prokka-annotated sequences


I annotated 500 sequences with Prokka from which I need to specifically extract only TcdA gene from all sequences, I need use the annotation of .ffn file of all sequences.

¿How can I do this automatically without having to open each folder of each sequence noted?

Prokka files:

Strain1

  >Strain1.err
  >Strain1.faa
  >Strain1.fna
  >Strain1.ffn *I use this file for extract gene*
  

I need the TcdA gene of the 500 sequences

Strain1_01428 glycosylating toxin TcdA ATGTCTTTAATATCTAAAGAAGAGTTAATAAAACTCGCATATAGCATTAGACCAAGAGAA AATGAGTATAAAACTATATTAACTAATTTAGACGAATATAATAAGTTAACTACAAACAAT AATGAAAATAAATATTTACAATTAAAAAAACTAAATGAATCAATTGATGTTTTTATGAAT AAATATAAAAATTCAAGCAGAAATAGAGCACTCTCTAATCTAAAAAAAGATATATTAAAA GAAGTAATTCTTATTAAAAATTCCAATACAAGTCCTGTAGAAAAAAATTTACATTTTGTA


Solution

  • something like:

    for i in /path/to/*.ffn; do awk 'BEGIN {RS=">"} /glycosylating toxin TcdA/ {print ">"$0}' $i > TcdA.fasta; done