I have been tasked with writing a shell script to grep through hundreds of log files in many directories on Linux and Solaris servers. Some of the logs are compressed in many formats and some are a few GB in size. I am worried about grep using a lot of resources on the server and possibly taking down web servers that are running on the machine through exhausting the memory (If this will likely happen).
Should I uncompress the files, grep them and then compress them again or use zgrep (or equivalent) to search them while compressed? Would there be an advantage resource wise to using one method over the other?
Also, is there a simple way to restrict the memory usage of a command to a percentage of what is currently available?
If someone could explain how memory usage works while running these commands, it would help out a lot.
grep
memory usage is constant; it doesn't scale with file size†. It doesn't need to keep the whole file in memory, only the area it's searching through.
Decompression is similar. Memory usage is proportional to the dictionary size, not to the total file size. Dictionary size is nothing to worry about: a few megabytes at most.
I would not worry about some simple grep
/ zgrep
/ zcat | grep
searches taking down other processes. This stuff is Linux's bread and butter.
† Beware of scanning through files with incredibly long lines, though. Its memory usage does scale with line length. You can use grep -I
to skip binary files, which is usually sufficient.