I have a list of short nucleotide sequences, one per line, which I need to convert to fasta format. I'm trying with awk, but my code so far just hangs, using a 10 line test file. My input file looks like:
ACGTACGTACGT
CGTACGTACGTA
GTACGTACGTAC
TACGTACGTACG
My output should have a numbered header line for each sequence - the number could be just counting from 1 or taking the line number from the input file (which should be the same), with the sequence on a new line, like this:
> seq 1
ACGTACGTACGT
> seq 2
CGTACGTACGTA
> seq 3
GTACGTACGTAC
> seq 4
TACGTACGTACG
I tried using the NR variable for the count:
awk -F '{echo "> seq ",NR;"\n"; print $0}' in.txt > out.fasta
Any suggestions welcome - I'm new at this!
Could you please try following.
awk '{print "> seq " ++count ORS $0}' Input_file
In case you want to use FNR
line count variable for awk
then you could try following too.
awk '{print "> seq " FNR ORS $0}' Input_file
You could re-direct output of above commands to an output_file by appending > output_file
to above commands too.