Search code examples
linuxbashfindcsplit

Invalid parameters using find and csplit


This should be a simple task ... !

I have a directory with a number of html files. Each one has a div called for a class called crumb. I want to split the file into two on crumb. Later, I'll concatenate the second part of the split file with a new beginning part.

So I tried this, to split all the html files - actually two files called news.html and about.html for the moment - on the pattern crumb:

find *.html -exec csplit - /crumb/ {} \;

But I have this response:

csplit: ‘about.html’: invalid pattern
csplit: ‘news.html’: invalid pattern

Why are the file names are being interpreted as a pattern?


Solution

  • You can get insight into the problem by adding 'echo'

    find *.html -exec echo csplit - /crumb/ {} \;
    

    Which will show

    csplit - /crumb/ about.html
    csplit - /crumb/ news.html
    

    Running those command interactively produces the error from the question: csplit: ‘about.html’: invalid pattern

    Checking with csplit man, it show the usage: 'csplit [OPTION]... FILE PATTERN...', indicating that the first parameter should be the file name, followed by the pattern. The command that is generated from the above script include the file name AFTER the pattern.

    Proposed fix:

    find *.html -exec  csplit  {} /crumb/ \;
    
    # OR, with unique suffix for every file, and 3 digit suffix
    find *.html -exec csplit --prefix {} --suffix-format='%03d' {} /crumb/ \;
    

    Which will execute:

    csplit about.html /crumb/
    csplit news.html /crumb/
    

    Not possible to tell if this generate the requested output (split the files as needed), as the input files are not provided.