I am processing a large list of md5, dir/filename
pairs. I need to insert the file size into the list to make md5, size, dir/filename
3-tuple list.
The relevant snippet of the data file is:
file MD5sum-stage1A.txt
...
d9c6be18d35619c7532f9c94f5a9bf58 /mnt/dir1/dir2/branch1/04 05 Custom .mp4
01c0fadb91c8ef0815a7753ad25a8c1c /mnt/dir1/dir2/branch1/branch2/Using the -proc directory and the $$ Variable.odt
...
EOF
The 2nd data line is the problem, with the $$
in the filename.
The code is working, except so far, for the one exception. the following is the code thus far:
someone@system01:~/tmp$ awk 'NR==15522, NR==15523 {
> md5=$1
> file=substr($0,35)
> size="###"
> cmd=sprintf("stat --format=%s \"%s\"", "%s",file)
> cmd | getline size
> close(cmd)
> printf "%s\t%s\t%s\n",md5, size, file
> }' MD5sum-stage1A.txt
d9c6be18d35619c7532f9c94f5a9bf58 6747587 /mnt/dir1/dir2/dir3/04 05 Custom .mp4
stat: cannot stat '/mnt/dir1/dir2/Using the -proc directory and the 20483 Variable.odt': No such file or directory
01c0fadb91c8ef0815a7753ad25a8c1c ### /mnt/dir1/dir2/Using the -proc directory and the $$ Variable.odt
So far the code is handling the nuances of the shell and handling spaces and most characters in a filename. However, the shell appears to be wanting to substitute the '$$' with the processid.
With Awk how can this behaviour be mitigated?
Many thanks for the answers, they gave me a few ideas and hopefully streamline and optimised the initial task. So in the spirit of sharing what I ended up with and maybe it will be of use to someone else, is:
find /media/BACKUPS/foo -not -path '*/\.*' -type f -exec md5sum {} \; -printf '%s\n' | awk 'NF==1{printf "%s\t%s\t%s\n",hash,$1,rest;next}{hash=$1;rest=substr($0,index($0,$2))}' > MD5sum-dataset-foo.txt
and again, Many Thanks ...