I have a fasta files with headers in two patterns like this
>256_Org1
MAVVIIKDAADDSLARRD
>Org2_10005
DSLARRDMAVVIIKDAA
I want to retain only the words and remove the numbers. I tried to use awk one liners suggested, but separating with delimiter '_'
and following with {print $1}
gives 256
(wrong) or Org2
(right). The output I expect is
>Org1
MAVVIIKDAADDSLARRD
>Org2
DSLARRDMAVVIIKDAA
In textwrangler, I can replace it in two steps, 1 with \>\d+\_
to >
and 2 with \_\d+\n
to \n
. But I have several hundred files and would like to use a one-liner. Any suggestions?
With GNU sed:
sed -E 's/^>[0-9]+_/>/; s/_[0-9]+ *$//' file
Output:
>Org1 MAVVIIKDAADDSLARRD >Org2 DSLARRDMAVVIIKDAA