Search code examples
sedfasta

Add words at beginning and end of the same line for the FASTA header line with sed


I have the following line:

>A_1000
ACTTTCGATCTCTTGTAGATCTGTTCTC...CAC
ACTTTCGATCTCTTGTAGATCTGTTCTC...CAC

I would like to convert the first line as follows:

>Initialword/A_1000/Finalword
ACTTTCGATCTCTTGTAGATCTGTTCTC...CAC
ACTTTCGATCTCTTGTAGATCTGTTCTC...CAC

I found a similar question that did allow me to append the end and the beginning as I needed (Add words at beginning and end of a FASTA header line with sed). However, it adds the Finalword to the next line.

I ran the following:

 sed 's%^>(.*)%>Initialword/\1/Finalword%' input.fasta > output.fasta

Which returns:

>Initialword/A_0101M/Finalword 
ACTTTCGATCTCTTGTAGATCTGTTCTC...CACM
ACTTTCGATCTCTTGTAGATCTGTTCTC...CACM

But in the Fasta file it looks like:

>Initialword/A_0101 
/Finalword 
ACTTTCGATCTCTTGTAGATCTGTTCTC...CAC
ACTTTCGATCTCTTGTAGATCTGTTCTC...CAC

How can I fix this to just add the text to the beginning and end of the header? What is the M at the end of each line in the file?

Thank you


Solution

  • First convert your file and then use GNU sed:

    dos2unix <input.fasta | sed -E 's%^>(.*)%>Initialword/\1/Finalword%' >output.fasta