Search code examples
bashsedfindzcat

Using zcat and sed in find -exec


I need to create a big text file using the contents of several gzipped files with a specific name pattern. To do that I used:

find . -name '*dna.toplevel.txt.gz' -exec zcat {} >> all.txt \;

and it worked just fine. The problem is, now I need to edit the text on the fly to substitute a specific character ">" with ">filename|". I've managed to cook this up:

find . -name '*dna.toplevel.txt.gz' -exec zcat {} | sed 's/>/>{}|/g' >> all.txt \;

But I am getting the following errors:

  • sed: can't read ;: No such file or directory
  • find: missing argument to `-exec'

I understand poor bash is confused because I did not specify correctly where each command ends, but I have no idea how to do it right.


Solution

  • -exec takes a simple command and its arguments; it does not handle shell constructs like pipes or redirections at all. Your original command is identical to

    find . -name '*dna.toplevel.txt.gz' -exec zcat {} \; >> all.txt
    

    since the shell recognizes the output redirection immediately and removes it from the command line before identifying the command (find) and its arguments.

    Since sed requires the filename from find as part of its command, you'll need to run a shell that takes the pipeline as an argument via the -c option.

    find . -name '*dna.toplevel.txt.gz' -exec \
      sh -c "zcat {} | sed 's/>/>{}|/g'" \; >> all.txt
    

    There are a few problems with this approach; fixing them requires making the sh command quite a bit more complicated. If you are using bash 4 or later, I'd recommend ditching find altogether and using a shell loop along with the ** glob:

    shopt -s globstar
    for f in ./**/*dna.toplevel.txt.gz; do
        zcat "$f" | sed "s|>|>$f|g"
    done >> all.txt
    

    If this command is creating all.txt, you can simply use > instead of >>. This also assumes that $f won't contain any | characters; if it does, you'll need to choose a different delimiter.