Search code examples
sedfasta

Add words at beginning and end of a FASTA header line with sed


I have the following line:

>XXX-220_5004_COVID-A6
TTTATTTGACATGAGTAAATTTCCCCTTAAATTAAGGGGTACTGCTGTTATGTCTTTAAA
AGAAGGTCAAATCAATGATATGATTTTATCTCTTCTTAGTAAAGGTAGACTTATAATTAG
AGAAAACAAC

I would like to convert the first line as follows:

>INITWORD/XXX-220_5004_COVID-A6/FINALWORD
TTTATTTGACATGAGTAAATTTCCCCTTAAATTAAGGGGTACTGCTGTTATGTCTTTAAA
AGAAGGT...

So far I have managed to add the first word as follows:

sed 's/>/>INITTWORD\//I'

That returns:

>INITWORD/XXX-220_5004_COVID-A6
    TTTATTTGACATGAGTAAATTTCCCCTTAAATTAAGGGGTACTGCTGTTATGTCTTTAAA
    AGAAGGT

How can i add the FINALWORD at the end of the first line?


Solution

  • Just substitute more. sed conveniently allows you to recall the text you matched with a back reference, so just embed that between the things you want to add.

    sed 's%^>\(.*\)%>INITWORD/\1/FINALWORD%I' file.fasta
    

    I also added a ^ beginning-of-line anchor, and switched to % delimiters so the slashes don't need to be escaped.

    In some more detail, the s command's syntax is s/regex/replacement/flags where regex is a regular expression to match the text you want to replace, and replacement is the text to replace it with. In the regex, you can use grouping parentheses \(...\) to extract some of the matched text into the replacement; so \1 refers to whatever matched the first set of grouping parentheses, \2 to the second, etc. The /flags are optional single-character specifiers which modify the behavior of the command; so for example, a /g flag says to replace every match on a line, instead of just the first one (but we only expect one match per line so it's not necessary or useful here).

    The I flag is non-standard but since you are using that, I assume it does something useful for you.