My file temp.txt looks like below
00ABC
PQR123400
00XYZ001234
012345
0012233
I want to split the file based on pattern '\r\n00'. In this case temp.txt should split into 3 files
first.txt:
00ABC
PQR123400
second.txt
00XYZ001234
012345
third.txt
0012233
I am trying to use csplit to match pattern '\r\n00' but the debug shows me invalid pattern. Can someone please help me to match the exact pattern using csplit
With your shown samples, please try following awk
code. Written and tested in GNU awk
.
This code will create files with names like: 1.txt
, 2.txt
and so on in your system. This will also take care of closing output files in backend so that we don't get in-famous error too many files opened
one.
awk -v RS='\r?\n00' -v count="1" '
{
outputFile=(count++".txt")
rt=RT
sub(/\r?\n/,"",rt)
if(!rt){
sub(/\n+/,"")
rt=prevRT
}
printf("%s%s\n",(count>2?rt:""),$0) > outputFile
close(outputFile)
prevRT=rt
}
' Input_file
Explanation: Adding detailed explanation for above code.
awk -v RS='\r?\n00' -v count="1" ' ##Starting awk program from here and setting RS as \r?\n00 aong with that setting count as 1 here.
{
outputFile=(count++".txt") ##Creating outputFile which has value of count(increases each time cursor comes here) followed by .txt here.
rt=RT ##Setting RT value to rt here.
sub(/\r?\n/,"",rt) ##Substituting \r?\n with NULL in rt.
if(!rt){ ##If rt is NULL then do following.
sub(/\n+/,"") ##Substituting new lines 1 or more with NULL.
rt=prevRT ##Setting preRT to rt here.
}
printf("%s%s\n",(count>2?rt:""),$0) > outputFile ##Printing rt and current line into outputFile.
close(outputFile) ##Closing outputFile in backend.
prevRT=rt ##Setting rt to prevRT here.
}
' Input_file ##Mentioning Input_file name here.