I want a bash command that will return a table, where each row is the human-readable filesize, number of lines, and filename. The table should be sorted by filesize.
I've been trying to do this using a combination of du -hs
, wc -l
, and sort -h
, and find
.
Here's where I'm at:
find . -exec echo $(du -h {}) $(wc -l {}) \; | sort -h
Your approach fell short not only because the shell expanded your command substitutions ($(...)
) up front, but more fundamentally because you cannot pass shell command lines directly to find
:
find
's -exec
action can only invoke external utilities with literal arguments - the only non-literal argument supported is the {}
representing the filename(s) at hand.
choroba's answer fixes your immediate problem by invoking a separate shell instance in each iteration, to which the shell command to execute is passed as a string argument (-exec bash -c '...' \;
).
While this works (assuming you pass the {}
value as an argument rather than embedding it in the command-line string), it is also quite inefficient, because multiple child processes are created for each input file.
(While there is a way to have find
pass (typically) all input files to a (typically) single invocation of the specified external utility - namely with terminator +
rather than \;
, this is not an option here due to the nature of the command line passed.)
An efficient and robust[1] implementation that minimizes the number of child processes created would look like this:
Note: I'm assuming GNU utilities here, due to use of head -n -1
and sort -h
.
Also, I'm limiting find
's output to files only (as opposed to directories), because wc -l
only works on files.
paste <(find . -type f -exec du -h {} +) <(find . -type f -exec wc -l {} + | head -n -1) |
awk -F'\t *' 'BEGIN{OFS="\t"} {sub(" .+$", "", $3); print $1,$2,$3}' |
sort -h -t$'\t' -k1,1
Note the use of -exec ... +
rather than -exec ... \;
, which ensures that typically all input filenames are passed to a single invocation to the external utility (if not all filenames fit on a single command line, invocations are batched efficiently to make as few calls as possible).
wc -l {} +
invariably outputs a summary line, which head -n -1
strips away, but also outputs filenames after each line count.
paste
combines the lines from each command (whose respective inputs are provided by a process substitution. <(...)
) into a single output stream.
The awk
command then strips the extraneous filename that stems from wc
from the end of each line.
Finally, the sort
command sorts the result by the 1st (-k1,1
) tab-separated (-t$'\t'
) column by human-readable numbers (-h
), such as the numbers that du -h
outputs (e.g., 1K
).
[1] As with any line-oriented processing, filenames with embedded newlines are not supported, but I do not consider this a real-world problem.