Search code examples
zshcsplit

csplit in zsh: splitting file based on pattern


I would like to split the following file based on the pattern ABC:

ABC
4
5
6
ABC
1
2
3
ABC
1
2
3
4
ABC
8
2
3

to get file1:

ABC
4
5
6

file2:

ABC
1
2
3

etc.

Looking at the docs of man csplit: csplit my_file /regex/ {num}.

I can split this file using: csplit my_file '/^ABC$/' {2} but this requires me to put in a number for {num}. When I try to match with {*} which suppose to repeat the pattern as much as possible, i get the error:

csplit: *}: bad repetition count

I am using a zshell.


Solution

  • To split a file on a pattern like this, I would turn to awk:

    awk 'BEGIN { i=0; } 
         /^ABC/ { ++i; } 
         { print >> "file" i }' < input
    

    This reads lines from the file named input; before reading any lines, the BEGIN section explicitly initializes an "i" variable to zero; variables in awk default to zero, but it never hurts to be explicit. The "i" variable is our index to the serial filenames.

    Subsequently, each line that starts with "ABC" will increment this "i" variable.

    Any and every line in the file will then be printed (in append mode) to the file name that's generated from the text "file" and the current value of the "i" variable.