Search code examples
sedfasta

Editing Uniref FASTA header ID


I'm trying to get the FASTA header of Uniref FASTA files to be in ”>ref|myid|seq definition” form. I know they are using sed command to work on it.

Header of the Uniref FASTA.

">UniRef100_Q6GZX4 Putative transcription factor 001R n=1 Tax=Frog virus 3 
(isolate Goorha) RepID=001R_FRG3G
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD"

To be as:

">UniRef100|Q6GZX4|Putative transcription factor 001R n=1 Tax=Frog virus 3
 (isolate Goorha) RepID=001R_FRG3G
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD"

Hope to get some clues on it. Thanks


Solution

  • Try this with GNU sed to replace first _ by | and first whitespace by |:

    sed 's/_/|/;s/ /|/' file > new_file
    

    or this to edit file:

    sed -i 's/_/|/;s/ /|/' file