Search code examples
unixgnu-parallel

Number of occurences of a String in all files using GNU parallel


I am trying to get count of a particular string from all the files in a directory

so i used find -name "application.log*" | parallel zgrep -c "Instructions before" {}

my expectation was it will count of string "Instructions before" on all the application.log files

but instead it actually gives the output something like this

find -name "application.log*" | parallel zgrep -c "Instructions before" {}
./application.log.2020-05-22-08-24.gz:0
gzip: before.gz: No such file or directory
before:0
./application.log.2020-05-22-08-22.gz:0
gzip: before.gz: No such file or directory
before:0
./application.log.2020-05-22-08-29.gz:0
gzip: before.gz: No such file or directory


Solution

  • It's a quoting issue. Quotes are eaten by the shell, so each zgrep process is invoked as zgrep -c Instructions before ./application.log.blah.gz, with Instructions taken as the string to search for, and before one of the files to search - zgrep apparently adds the .gz extension if missing.

    So you need to quote the quotes:

    find -name "application.log*" -print0 | parallel -0 zgrep -c '"Instructions before"' {}
    

    or tell parallel to do it for you:

    find -name "application.log*" -print0 | parallel -0q zgrep -c "Instructions before" {}
    

    And if all the files you care about are in the same directory and not also in subdirectories, see Mark's comment for a simpler way that avoids the find.