I have a fasta with headers like this:
tr|Q7MX99|Q7MX99_PORGI_BACT
I would like them to say:
tr|Q7MX99|Q7MX99_PORGI_BACT_ORALMICROBIOME
So basically, whenever I have PORGI_BACT I want to append _ORALMICROBIOME to each instance.
I'm sure there is an easy fix through the terminal, but I can't seem to find it.
My first idea is to do something like:
sed 's/>.*/&_ORALMICROBIOME/' file.fa > outfile.fa
BUT I only want to add to specific header endings, and that is where I'm stuck.
You are almost close. Would you please try the following:
sed 's/^>.*PORGI_BACT/&_ORALMICROBIOME/' file.fa > outfile.fa
[Edit]
According to the OP's requirement, how about:
sed -E 's/^>.*(PORGI_BACT|HUMAN_MAM|TESTA_BACT)/&_ORALMICROBIOME/' file.fa > outfile.fa
Sample input as file.fa
:
>SEQ0|tr|Q7MX99|Q7MX99_PORGI_BACT
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>SEQ1|tr|Q7MX88|Q7MX88_HUMAN_MAM
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME
LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>SEQ2|tr|Q7MX77|Q7MX77_TESTA_BACT
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
>SEQ3|tr|Q7MX66|Q7MX66_DUMMY
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK
Output:
>SEQ0|tr|Q7MX99|Q7MX99_PORGI_BACT_ORALMICROBIOME
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>SEQ1|tr|Q7MX88|Q7MX88_HUMAN_MAM_ORALMICROBIOME
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME
LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>SEQ2|tr|Q7MX77|Q7MX77_TESTA_BACT_ORALMICROBIOME
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
>SEQ3|tr|Q7MX66|Q7MX66_DUMMY
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK