I have gene sequence file and I would like to change the header of each gene. Here is the input:
>lcl|CP000046.1_cds_AAW37389.1_1 [gene=dnaA] [locus_tag=SACOL0001] [protein=chromosomal replication initiator protein DnaA] [protein_id=AAW37389.1] [location=544..1905] [gbkey=CDS]
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGAAATTGCTCAAGAAAAATTATCAGCTGTAAGTTACTCAACTTTCCTAAAAGATACTGAGCTTTACACGATTAAAGATGGTGAAGCTATCGTATTATCGAGTATTCCTTTTAATGCAAATTGGTTAAATCAACAATATGCTGAAATTATCCAAGCAATCTTATTTGATGTTGTAGGCTATGAAGTTAAACCTCACTTTATTACTCTGAAGAATTAGCAAATTATAGTAATAATGAAACTGCTACTCCAAAAGAAACAACAAAACCTTCTACTGAAACAACTGAGGATAATCATGTGCTTGGTAGAGAGCAATTCAATGCCCATAACACATTTGACACTTTTGTAATCGGACCCGGTAACCGCTTTCCACATGCAGCGAGTTTAGCTGTGGCCGAAGCACCAGCCAAAGCGTACAATCCATTATTTATCTATGGAGGTGTTGGTTTA
>lcl|CP000046.1_cds_AAW37390.1_2 [gene=dnaN] [locus_tag=SACOL0002] [protein=DNA polymerase III, beta subunit] [protein_id=AAW37390.1] [location=2183..3316] [gbkey=CDS]
ATGATGGAATTCACTATTAAAAGAGATTATTTTATTACACAATTAAATGACACATTAAAAGCTATTTCACCAAGAACAACATTACCTATATTAACTGGTATCAAAATCGATGCGAAAGAACATGAAGTTATATTAACTGGTTCAGACTCTGAAATTTCAATAGAAATCACTATTCCTAAAACTGTAGATGGCGAAGATATTGTCAATATTTCAGAAACAGGCTCAGTAGTACTTCCTGGACGATTCTTTGTTGATATTATAAAAAAATTACCTGGTAAAGATGTTAAATTATCTACAAATGAACAATTCCAGACATTAATTACATCAGGTCATTCTGAATTTAATTTAAGTGGCTTAGATCCAGATCAATATCCTTTATTACCTCAAGTTTCTAGAGATG
Expected Output:
>Saureus1|SACOL0001
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGAAATTGCTCAAGAAAAATTATCAGCTGTAAGTTACTCAACTTTCCTAAAAGATACTGAGCTTTACACGATTAAAGATGGTGAAGCTATCGTATTATCGAGTATTCCTTTTAATGCAAATTGGTTAAATCAACAATATGCTGAAATTATCCAAGCAATCTTATTTGATGTTGTAGGCTATGAAGTTAAACCTCACTTTATTACTCTGAAGAATTAGCAAATTATAGTAATAATGAAACTGCTACTCCAAAAGAAACAACAAAACCTTCTACTGAAACAACTGAGGATAATCATGTGCTTGGTAGAGAGCAATTCAATGCCCATAACACATTTGACACTTTTGTAATCGGACCCGGTAACCGCTTTCCACATGCAGCGAGTTTAGCTGTGGCCGAAGCACCAGCCAAAGCGTACAATCCATTATTTATCTATGGAGGTGTTGGTTTA
>Saureus1|SACOL0002
ATGATGGAATTCACTATTAAAAGAGATTATTTTATTACACAATTAAATGACACATTAAAAGCTATTTCACCAAGAACAACATTACCTATATTAACTGGTATCAAAATCGATGCGAAAGAACATGAAGTTATATTAACTGGTTCAGACTCTGAAATTTCAATAGAAATCACTATTCCTAAAACTGTAGATGGCGAAGATATTGTCAATATTTCAGAAACAGGCTCAGTAGTACTTCCTGGACGATTCTTTGTTGATATTATAAAAAAATTACCTGGTAAAGATGTTAAATTATCTACAAATGAACAATTCCAGACATTAATTACATCAGGTCATTCTGAATTTAATTTAAGTGGCTTAGATCCAGATCAATATCCTTTATTACCTCAAGTTTCTAGAGATG
I know how to delete a line congaing specific word with sed
sed '/^>/ d' inputfile > outputfile
But I am not getting any Idea to get the expected output. Here, in first part I should delete all the text in the gene header except SACOL00 and later preceding that I should keep fasta sysmbol ">" with Strain name.
If this kind of question repeated please excuse me.
With GNU sed:
sed -E 's/^>.*locus_tag=([^]]*).*/Saureus1|\1/' file
With sed:
sed 's/^>.*locus_tag=\([^]]*\).*/Saureus1|\1/' file