I need to create a big text file using the contents of several gzipped files with a specific name pattern. To do that I used:
find . -name '*dna.toplevel.txt.gz' -exec zcat {} >> all.txt \;
and it worked just fine. The problem is, now I need to edit the text on the fly to substitute a specific character ">" with ">filename|". I've managed to cook this up:
find . -name '*dna.toplevel.txt.gz' -exec zcat {} | sed 's/>/>{}|/g' >> all.txt \;
But I am getting the following errors:
I understand poor bash is confused because I did not specify correctly where each command ends, but I have no idea how to do it right.
-exec
takes a simple command and its arguments; it does not handle shell constructs like pipes or redirections at all. Your original command is identical to
find . -name '*dna.toplevel.txt.gz' -exec zcat {} \; >> all.txt
since the shell recognizes the output redirection immediately and removes it from the command line before identifying the command (find
) and its arguments.
Since sed
requires the filename from find
as part of its command, you'll need to run a shell that takes the pipeline as an argument via the -c
option.
find . -name '*dna.toplevel.txt.gz' -exec \
sh -c "zcat {} | sed 's/>/>{}|/g'" \; >> all.txt
There are a few problems with this approach; fixing them requires making the sh
command quite a bit more complicated. If you are using bash
4 or later, I'd recommend ditching find
altogether and using a shell loop along with the **
glob:
shopt -s globstar
for f in ./**/*dna.toplevel.txt.gz; do
zcat "$f" | sed "s|>|>$f|g"
done >> all.txt
If this command is creating all.txt
, you can simply use >
instead of >>
. This also assumes that $f
won't contain any |
characters; if it does, you'll need to choose a different delimiter.