how can I reformat sequences (several lines) in a fasta file to single line?

Input "file.fasta" (note, this is a sample .... in fasta file, the sequences may have more than three lines)

>chr1:117223140-117223856 TAG:GTGGG
GTGGgggggcgCATAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGAGtt
aGTAGTATCGAATCGCACGACTGACAGCTCAGCATCAGCGACGACTAGTG
GTGGGCGACGACAgCGATATA
>chr2:117223140-117223856 TAG:GGGCT
ACGAGCAGCAGCAGCAGCagCCGATCGACGACTCAAGTACGATACGCGaa
cCCCCCGACGACGACTCACGA

Expected output

>chr1:117223140-117223856 TAG:GTGGG
GTGGgggggcgCATAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGAGttaGTAGTATCGAATCGCACGACTGACAGCTCAGCATCAGCGACGACTAGTGGTGGGCGACGACAgCGATATA
>chr2:117223140-117223856 TAG:GGGCT
ACGAGCAGCAGCAGCAGCagCCGATCGACGACTCAAGTACGATACGCGaacCCCCCGACGACGACTCACGA

my effort: sed command

sed ':a;N;$!ba;s/\([actgACGT]\)\n\([actgACGT]\)/\1\2/g' file.fasta

my wrong output:

>chr1:117223140-117223856 TAG:GTGGGGTGGgggggcgCATAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGAGttaGTAGTATCGAATCGCACGACTGACAGCTCAGCATCAGCGACGACTAGTGGTGGGCGACGACAgCGATATA
>chr2:117223140-117223856 TAG:GGGCTACGAGCAGCAGCAGCAGCagCCGATCGACGACTCAAGTACGATACGCGaacCCCCCGACGACGACTCACGA

The regular expression for header (lines whose first letter is ">") is "^>.*$", but I do not know how to include in sed command

thanks in advance

Solution

This might work for you (GNU sed):

sed ':a;N;/>/!s/\n//;ta;P;D' file

Look at two lines and if either does not contains a > delete the newline between them and repeat. If either of the lines does contain a > then print and delete the first of them and then repeat.