Search code examples
shellformattingzshdu

Formatting du command output


Is there a way to format(in a shell command chain) the following output of du -s -k *

287720  crm-cc
21500   crm-mvh
40360   elasticsearch-5.1.2
293292  electron-quick-start
44636   hexagon
193572  jpk
132 knights
209860  pink-panther
1722104 popc
4   server-config.txt
45392   sigb-backend
47468   test
58904   um-report
164156  zeus

In the following way:

1,763,434,496 popc
  300,331,008 electron-quick-start
  294,625,280 crm-cc
  214,896,640 pink-panther
  198,217,728 jpk
  168,095,744 zeus
   60,317,696 um-report
   48,607,232 test
   46,481,408 sigb-backend
   45,707,264 hexagon
   41,328,640 elasticsearch-5.1.2
   22,016,000 crm-mvh
      135,168 knights
        4,096 server-config.txt

By that I mean:

  • Sort the files/directories by their size in descending order.
  • Insert a thousands separator every three characters counting from the right.
  • Insert leading spaces to align the size to the right

I have implemented this in PHP but I was hoping for a more universal solution(not every linux-based OS has the PHP interpreter installed).

If it's relevant, I'm using zsh most of the time, so the solution can be limited to this shell.


Solution

  • As an example, let's consider a directory with these files:

    $ du -sk *
    12488   big.log
    200     big.pdf
    4       f1
    2441412 output.txt
    160660  program.zip
    4       smallfile
    4       some.txt
    

    To reformat du as you desire:

    $ du -sk * | sort -rn | sed -E ':a; s/([[:digit:]]+)([[:digit:]]{3})/\1,\2/; ta' | awk -F'\t' '{printf "%10s %s\n",$1,substr($0,length($1)+2)}'
     2,441,412 output.txt
       160,660 program.zip
        12,488 big.log
           200 big.pdf
             4 some.txt
             4 smallfile
             4 f1
    

    Note: This approach will work even with file names that contain whitespace.

    Shell function for convenience

    Since the above is a lot to type, let's create a shell function:

    $ dusk() { du -sk "$@" | sort -rn | sed -E ':a; s/([[:digit:]]+)([[:digit:]]{3})/\1,\2/; ta' | awk -F'\t' '{printf "%10s %s\n",$1,substr($0,length($1)+2)}';}
    

    We can use the shell function as follows:

    $ dusk *
     2,441,412 output.txt
       160,660 program.zip
        12,488 big.log
           200 big.pdf
             4 some.txt
             4 smallfile
             4 f1
    

    How it works

    • du -sk *

      This is our du command.

    • sort -rn

      This does a numerical sort in reverse order so that largest files come first.

    • sed -E ':a; s/([[:digit:]]+)([[:digit:]]{3})/\1,\2/; ta'

      This puts commas where we want them.

    • awk -F'\t' '{printf "%10s %s\n",$1,substr($0,length($1)+2)}';}

      This right-justifies the numbers.

    Multiline version

    For those who prefer their commands spread out over multiple lines:

    du -sk * |
        sort -rn |
        sed -E ':a; s/([[:digit:]]+)([[:digit:]]{3})/\1,\2/; ta' |
        awk -F'\t' '{printf "%10s %s\n",$1,substr($0,length($1)+2)}'
    

    Compatibility with Mac OSX/BSD

    Try this and see if it works on OSX:

    $ echo 1234567890 | sed -E -e :a -e 's/([[:digit:]]+)([[:digit:]]{3})/\1,\2/' -e ta 
    1,234,567,890
    

    If that works, then let's revise the complete command to be:

    du -sk * | sort -rn | sed -E -e :a -e 's/([[:digit:]]+)([[:digit:]]{3})/\1,\2/' -e ta  | awk -F'\t' '{printf "%10s %s\n",$1,substr($0,length($1)+2)}'