Search code examples
awkfasta

convert raw sequence to fasta using awk


I have a list of short nucleotide sequences, one per line, which I need to convert to fasta format. I'm trying with awk, but my code so far just hangs, using a 10 line test file. My input file looks like:

ACGTACGTACGT
CGTACGTACGTA
GTACGTACGTAC
TACGTACGTACG

My output should have a numbered header line for each sequence - the number could be just counting from 1 or taking the line number from the input file (which should be the same), with the sequence on a new line, like this:

> seq 1
ACGTACGTACGT
> seq 2
CGTACGTACGTA
> seq 3
GTACGTACGTAC
> seq 4
TACGTACGTACG

I tried using the NR variable for the count:

awk -F '{echo "> seq ",NR;"\n"; print $0}' in.txt > out.fasta   

Any suggestions welcome - I'm new at this!


Solution

  • Could you please try following.

    awk '{print "> seq " ++count ORS $0}'  Input_file
    

    In case you want to use FNR line count variable for awk then you could try following too.

    awk '{print "> seq " FNR ORS $0}'  Input_file
    

    You could re-direct output of above commands to an output_file by appending > output_file to above commands too.