How do I copy files distributed throughout a deeply-nested sub-directory, to another sub-directory which is not nested at all (i.e., is flat)? To heighten the difficulty level, I have these constraints/wrinkles.
I've tried cp
, find
, xargs
, parallel
, uuidgen
, md5sum
, Bash for
loops, and various combinations thereof with limited success. The best I've been able to achieve is generating a random UUID for each file. That's OK, I guess, but it's not exactly the "content-addressing" I'd like, because I'd like to de-dupe the files based on their content.
For reference, that looks like this, where source
and dest
are the source and destination sub-directories.
find source/* -type f -exec sh -c 'for f; do cp "$f" 'dest'/"$(uuidgen)"; done' Renamer {} +
Though UUIDs are nice, I don't have my heart set on them and am open to other ideas, modulo the constraints above.
Thanks!
Use the command md5sum
to calculate the md5sum
of the content of a file:
find * -type f -exec sh -c 'for f; do cp "$f" 'dest'/$(md5sum "$f" | sed -e s/[[:space:]].*//); done' _ {} +
This uses sed
to massage the output of md5sum
to not contain the filename rather than the usual md5sum <file> | awk' {print $1}'
so that I don't have to think about escaping quotes.
Of course, you might have hash collisions with md5, but you can easily switch the hashing to use sha256sum
or whatever you like.