Search code examples
shellksh

Splitting file based on pattern '\r\n00' in korn shell


My file temp.txt looks like below

00ABC
PQR123400
00XYZ001234
012345
0012233

I want to split the file based on pattern '\r\n00'. In this case temp.txt should split into 3 files

first.txt: 
00ABC
PQR123400

second.txt
00XYZ001234
012345

third.txt
0012233

I am trying to use csplit to match pattern '\r\n00' but the debug shows me invalid pattern. Can someone please help me to match the exact pattern using csplit


Solution

  • With your shown samples, please try following awk code. Written and tested in GNU awk.

    This code will create files with names like: 1.txt, 2.txt and so on in your system. This will also take care of closing output files in backend so that we don't get in-famous error too many files opened one.

    awk -v RS='\r?\n00' -v count="1" '
    {
      outputFile=(count++".txt")
      rt=RT
      sub(/\r?\n/,"",rt)
      if(!rt){
        sub(/\n+/,"")
        rt=prevRT
      }
      printf("%s%s\n",(count>2?rt:""),$0) > outputFile
      close(outputFile)
      prevRT=rt
    }
    '  Input_file
    

    Explanation: Adding detailed explanation for above code.

    awk -v RS='\r?\n00' -v count="1" '      ##Starting awk program from here and setting RS as \r?\n00 aong with that setting count as 1 here.
    {
      outputFile=(count++".txt")            ##Creating outputFile which has value of count(increases each time cursor comes here) followed by .txt here.
      rt=RT                                 ##Setting RT value to rt here.
      sub(/\r?\n/,"",rt)                    ##Substituting \r?\n with NULL in rt.
      if(!rt){                              ##If rt is NULL then do following.
        sub(/\n+/,"")                       ##Substituting new lines 1 or more with NULL.
        rt=prevRT                           ##Setting preRT to rt here.
      }
      printf("%s%s\n",(count>2?rt:""),$0) > outputFile   ##Printing rt and current line into outputFile.
      close(outputFile)                     ##Closing outputFile in backend.
      prevRT=rt                             ##Setting rt to prevRT here.
    }
    '  Input_file                           ##Mentioning Input_file name here.