I have a fasta file like this:
>rna-XM_00001.1
actact
>rna-XM_00002.1
atcatc
How do I remove the 'rna-' so it become
>XM_00001.1
actact
>XM_00002.1
atcatc
What you're showing is the file contents? Then sed should be able to do this:
sed 's/^>rna-/>/' < inputfile > outputfile
Explanation:
s
, which tells sed to do substitution/
are delimiters^
tells sed to look only at the start of a line>rna-
is the pattern to match at the start of a line>
is the replacement substituted for the patternIf, instead, you want to always remove the first four characters after a >
as long as they end in -
, you could use:
sed 's/^>...-/>/' < inputfile > outputfile
Explanation:
>...-
. The pattern is a regexp, where a .
matches any single character. So this pattern matches any line starting with >
, followed by any three characters, followed by -
.