Search code examples
shellawkgreppipetcsh

Empty files being included in a grep: More efficient to just ignore them or use arguments/piping to filter them out?


I have a tcsh alias set up on a server that is designed to process the apache logs for the domains hosted on it for instances of wp-login.php being accessed. After excluding the error.log files in the initial grep, it will do another grep to remove the 0 instance results, and then pipes it through awk, sort, and head for cleaner reading.

grep -c --exclude="/var/log/httpd/domains/*.error.log" wp-login.php /var/log/httpd/domains/*.log | grep -v :0 | awk -F'\'':'\'' '\''{print $2,$1}'\'' | sort -nr | head -n 10

Of the files being scanned by that first grep, 90% of them are empty. From an efficiency perspective, would it be better to try and work around the 1k+ empty files (and if so, how) or is the time spent processing them so small comparatively that, even if there are over a thousand empty log files being processed to just a hundred non-empty log files, the gains would be minimal?

Example output

# wpbf
12 /var/log/httpd/domains/DOMAIN1.TLD.log
10 /var/log/httpd/domains/DOMAIN2.TLD.log
8 /var/log/httpd/domains/DOMAIN3.TLD.log
7 /var/log/httpd/domains/DOMAIN4.TLD.log
6 /var/log/httpd/domains/DOMAIN5.TLD.log
6 /var/log/httpd/domains/DOMAIN6.TLD.log
6 /var/log/httpd/domains/DOMAIN7.TLD.log
6 /var/log/httpd/domains/DOMAIN8.TLD.log
6 /var/log/httpd/domains/DOMAIN9.TLD.log
6 /var/log/httpd/domains/DOMAIN10.TLD.log

Solution

  • Try using find with -empty option for files finding:

    find /var/log/httpd/domains/ -maxdepth 1 -type f -name '*.log' '!' -name '*.error.log' '!' -empty |
    xargs -d'\n' grep -Hc wp-login.php |
    awk -F: '$2 != 0{print $2,$1}' | sort -nr | head -n10