Search code examples
bashawksplitlarge-files

Divide very large file into small ones following pattern


I have been working on this problem with only little success so I am coming here to get some fresh advices.

I am trying to extract the data of every scan into separate files.

The problem is that after 3196 files created I receive the error message : awk “makes too many open files”.

I understand that I need to close the files created by awk but I don't know how to do that.

Text inputfile is looking like this (up to 80 000 Scan):

Scan    1
11111    111
22222    221
...
Scan    2
11122    111
11122    111
...
Scan    3
11522    141
19922    141
...

For now I have been doing :

awk '/.*Scan.*/{n++}{print >"filescan" n }' inputfile

Which gives me an incremented output file for every Scan and crash after 3196 files created..

cat filescan1
Scan    1
11111    111
22222    221
...

Any idea ?


Solution

  • You need to close the output file as awk is keeping the file handle open.

    awk '/.*Scan.*/{ 
      close(file);
      n++;
    }
    { 
      file="filescan"n; 
      print >> file;
    }' inputfile