Given a file in FASTA format (your.file), for example:
>Code1234_length1
ABCEDLKSDJFABCEDLKSDJFABCEDLKSDJFABCEDLKSDJFABCEDLKSDJF
>Code1335_length2
AJDHIEUNAJDHIEUNAJDHIEUNAJDHIEUNAJDHIEUNAJDHIEUN
But the content after >Code1234_length1
is unknown (in this example it was known just for a reproducible sample).
I would like to get the unknown contents after >Code1234_length1
, including the string >Code1234_length1
but before the next >
and output it in a new file.
i.e.
>Code1234_length1
ABCEDLKSDJFABCEDLKSDJFABCEDLKSDJFABCEDLKSDJFABCEDLKSDJF
How could this be done? Thank you.
If awk
is your option, would you please try:
awk '
/^>Code1234_length1/ {f = 1; print; next} # if the keyword is found, set the flag,
# print the line and continue with the next line
f { # if the flag is set
if (/^>/) f = 0 # if next ">" is found, reset the flag
else print # otherwise print the line
}
' your.file > new.file
It works even if multiple lines follow the >Code1234_length1
line.