I have a fasta file with multiple headers:
>CABITT030000001.1 genome assembly, contig: 1, whole genome shotgun sequence
>CABITT030000002.1 genome assembly, contig: 2, whole genome shotgun sequence
.
.
.
.
And I would like to leave only the 1
and 2
either from the CABITT03000000*.1
or the number after the contig:
string.
Output:
>1
>2
I was trying it with sed command, but it doesnt work.
sed 's/>.*/>1/' fasta.fa > newfasta.fa
Going on the example input you provided, this should work:
sed -e 's/.* contig: \([[:digit:]]\).*/>\1/' fasta.fa
>1
>2
Using a character class for the digit ([[:digit:]]
), and capture groups (\( \)
and reference that group with \1
in the replacement).