Search code examples
bioinformaticsfasta

Merge every last two lines in a 3-line row together


I wanted to merge every line 2-3 together and keep line 1. Here is the example of my text

>chrX:147147161-147148161
ATGATGGTGATGTACAGATGGGTTTTTGG
TTATCTAATTCATGTGTTGGTCAGATCAA
>chrY:16119725-16120725
CAGCTTTGTTCCGTTGCTGGTGAGGAACT
GACTCCCTGGGTGTAGGACCCTCCGAGCC

What I want it to look like

>chrX:147147161-147148161
ATGATGGTGATGTACAGATGGGTTTTTGGTTATCTAATTCATGTGTTGGTCAGATCAA
>chrY:16119725-16120725
CAGCTTTGTTCCGTTGCTGGTGAGGAACTGACTCCCTGGGTGTAGGACCCTCCGAGCC

I have tried several ways but none has been working so far. Here is what I have been trying to do

> sed '/>$/,/>$/ {//b; N; s/\n//;}' file.txt

This command could not merge my lines. I also tried this before

> paste -d "" - - < txt.file . 

This only merge my chr line and the sequence line, which was not what I wanted. Can someone give my some suggestions? Thank you!


Solution

  • You are dealing with FASTA files and these can be processed with ease using awk. The following works for generic fasta files, with one or more lines per sequence.

    awk 'BEGIN{RS=">";FS="\n";OFS=""}
        (FNR==1){next}
        {name=$1;seq=substr($0,index($0,FS));gsub(FS,OFS,seq)}
        {print RS name FS seq}' file.fasta