The problem: I am trying to calculate an average file size for the directory I'm in (ignoring sub-directories) using one-liners. I have two methods:
ls -l | gawk '{sum += $5; n++;} END {print sum/n;}'
and
var1=$(du -Ss| awk '{print $1}') ; var2=$(ls -l | wc -l) ; echo $var1/$var2 | bc
They seem to yield similar numbers, albeit different units (first one in kB, second one in MB).
The numbers themselves however are slightly wrong. What's going on? Which one is more right?
du
and ls
report differently. Consider this part of the du
man page:
--apparent-size
print apparent sizes, rather than disk usage; although the
apparent size is usually smaller, it may be larger due to holes
in ('sparse') files, internal fragmentation, indirect blocks,
and the like
That gives an idea about the possible differences between what ls
shows (apparent size) and what du
shows (by default, the actual disk usage).
$ truncate -s 10737418240 sparse
$ ls -l sparse
-rw-rw-r-- 1 ec2-user ec2-user 10737418240 Feb 20 00:19 sparse
$ du sparse
0 sparse
$ ls -ls sparse
0 -rw-rw-r-- 1 ec2-user ec2-user 10737418240 Feb 20 00:19 sparse
The above shows the difference in reporting for a sparse file.
Also, the counting of files using ls -l
will include subdirectories, symlinks, etc. You can instead use find
to show only files:
find . -maxdepth 1 -type f