Search code examples
pythonbashwc

"Argument list too long" error when passing a find output as arguments


I have some 5 million text files under a directory - all of the same format (nothing special, just plain text files with some integers in each line). I would like to compute the maximum and minimum line count amongst all these files, along with its two filenames (the one for max and another for min).

I started out by trying to write out all the line count like so (and then workout how to find the min and max from this list):

wc -l `find /some/data/dir/with/text/files/ -type f` > report.txt

but this throws me an error:

bash: /usr/bin/wc: Argument list too long

Perhaps there is a better way to go about this?


Solution

  • There is a limit to the argument list length. Since you have several millions files passed to wc, the command certainly crossed this line.

    Better invoke find -exec COMMAND instead:

    find /some/data/dir/with/text/files/ -type f -exec wc -l {} + > report.txt
    

    Here, each found file find will be appended to the argument list of the command following -exec in place of {}. Before the argument length is reached, the command is run and the remaining found files will be processed in a new run of the command the same way, until the whole list is done.

    See man page of find for more details.


    Thanks to Charles Duffy for the improvements of this answer.