I'm trying to get the FASTA header of Uniref FASTA files to be in ”>ref|myid|seq definition” form. I know they are using sed command to work on it.
Header of the Uniref FASTA.
">UniRef100_Q6GZX4 Putative transcription factor 001R n=1 Tax=Frog virus 3
(isolate Goorha) RepID=001R_FRG3G
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD"
To be as:
">UniRef100|Q6GZX4|Putative transcription factor 001R n=1 Tax=Frog virus 3
(isolate Goorha) RepID=001R_FRG3G
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD"
Hope to get some clues on it. Thanks
Try this with GNU sed to replace first _
by |
and first whitespace by |
:
sed 's/_/|/;s/ /|/' file > new_file
or this to edit file:
sed -i 's/_/|/;s/ /|/' file