Search code examples
regexawksubstringfasta

AWK: Write lines into multiple files


I'm trying to extract sequences from a FASTA file using awk.

e.g. the file looks like this and it contains 703 sequences. I want to extract each of them to separate files.

>sequence_1
AACTTGGCCTT
>sequence_2
AACTTGGCCTT
.
.
.

I'm using this awk script:

awk '/>/ {OUT=substr($0,2) ".fasta"}; OUT {print >OUT}'file.fasta

...which works but only for the 16 first and then I get an error saying;

.fasta makes too many open files
input record number 35, file file.fasta
source line number 1

Solution

  • You would need to close files when you're done. Try:

    awk '/>/ {close(OUT); OUT=substr($0,2) ".fasta"}; OUT {print > OUT}' file.fasta