Search code examples
awkheaderappendmatchingfasta

Append fasta header ids with matching strings from another file with AWK


I'd like to append fasta header ids with information from another file like below. I've tried lots of awk commands and searched different threads but nothing works. Any help would be greatly appreciated.

File_1
>id1
agcataattaat
>id2
gccatataatgg
>id3
gccaaattaggg
>id4
ataatttagccc

File_2
>id2 descriptionXYZ
>id4 description3E4

Desired output
>id1
agcataattaat
>id2 descriptionXYZ
gccatataatgg
>id3
gccaaattaggg
>id4 description3E4
ataatttagccc


Solution

  • sed -r 'N;s/\n//g' file1 > fileSorted.fasta\
    join -a 2 -1 1 -2 1 file2.txt fileSorted.fasta > out.fasta\
    sed -r 's/^(.+) ([atgc]+)$/\1\n\2/g' out.fasta > out2.fasta 
    

    note: if using multi-line fasta first remove EOLs in sequence
    then remove header EOL for header + sequence line
    run above command on sorted files
    restore header EOL