Search code examples
bashfindduplicatessizefilenames

find files returning basename and size


In bash, I can get the basename (name without path) of found files like this:

find . -exec basename {} \;

and I can get the file size like this:

find . -exec ls -l {} \; | awk '{print $5}'

but I need to get the basename and filesize separated by a space.

How do i combine those two commands correctly using one find operation? This code does not work:

find . -exec basename {} \; -exec ls -l {} | awk '{print $5}' \;

awk: can't open file ;find: 
 source line number 1
-exec: no terminating ";" or "+"

I am trying to create a fast duplicate file finder. Using this list, I would do a sort and then use uniq to find all files that are duplicates using the criteria: a duplicate = same "basename" & same "size" (without an md5 check).

So far, just making this initial list is where I am hung up syntactically (and maybe programmatically). Please let me know if you have a better method. It am trying to make it work using the most basic bash commands so it works on both linux and mac without installing anything.


Solution

  • GNU systems

    For GNU systems, use this command

    find . -printf '%k\t%f\n'
    

    to get your output of size of each file and basename.

    • %k prints size of file in kb
    • \t literal tab character
    • %f prints filename with leading directory path removed
    • \n literal newline character

    OSX

    For OSX, use this command since it doesn't natively support the -printf argument directly.

    find . -exec bash -c 'printf "%s\t%s\n" $(stat -f " %z" "$1") "$(basename "$1")"' - {} \;