I have a file generated by assemblers. It looks like following.
>NODE_1_length_211_cov_22.379147
CATTTGCTGAAGAAAAATTACGAGAAATGGAGCACAAGGCTGTTTTTGTGAATGTCAAAC
CAAGTGACAACTCTATAGCGTTTGTATAAGACTCTCATACTAATCCCAAGCAAACTCTAT
ACTGACGCATGAACATGGAAGAGAAATGCTGCTCGTGTATGTATTATGGACCAGCTTGGA
ACACCATGTTAGGACTTTATAGATGTCTTACGATTTTTTCGACGTGATGAAGAAGTCTAT
TCAGCATTTGA
>NODE_2_length_85_cov_19.094118
TACTCCTGAGCACTTTGTGCTCTTAGTTCTTACTAGAACTGTTACAGCTCCACGAACTTG
TCGACTCTTTGAGTCAATTTCTGTTAGTTCCTACGAACTAAGAGGCTCTCTGAGCCCAGT
CTTCC
I want to merge the lines using python or linux sed command and want result in this way.
>NODE_1_length_211_cov_22.379147
CATTTGCTGAAGAAAAATTACGAGAAATGGAGCACAAGGCTGTTTTTGTGAATGTCAAACCAAGTGACAACTCTATAGCGTTTGTATAAGACTCTCATACTAATCCCAAGCAAACTCTATACTGACGCATGAACATGGAAGAGAAATGCTGCTCGTGTATGTATTATGGACCAGCTTGGAACACCATGTTAGGACTTTATAGATGTCTTACGATTTTTTCGACGTGATGAAGAAGTCTATTCAGCATTTGA
>NODE_2_length_85_cov_19.094118
TACTCCTGAGCACTTTGTGCTCTTAGTTCTTACTAGAACTGTTACAGCTCCACGAACTTGTCGACTCTTTGAGTCAATTTCTGTTAGTTCCTACGAACTAAGAGGCTCTCTGAGCCCAGTCTTCC
like every seqeunce consider as single line and Node name as other line.
You can use awk
to do the job:
awk < input_file '/^>/ {print ""; print; next} {printf "%s", $0} END {print ""}'
This only starts one process (awk
). Only drawback: it adds an empty first line. You can avoid such things by adding a state variable (the code belongs on one line, it's just to make it better readable):
awk < input_file '/^>/ { if (flag) print ""; print; flag=0; next }
{ printf "%s", $0; flag=1 } END { if (flag) print "" }'
@how to store it in a new file:
awk < input_file > output_file '/^>/ { .... }'