I have a directory or zipped files, each containing a group of XML files I need to make a script that will extract XML files from those ZIPs if they contain a certain string
for z in `ls /path/to/archives/*.zip`
do for f in `unzip -l $z | grep 'xml' | awk -F" " '{print "$4" "$5}'`
do r = $( unzip -p $z $f | grep $string )
if [ '$r' != '' ]
unzip $z $f
fi
done
done
When this runs, zip file A.zip containing a file called 'my file.xml' causes the loop to handle it as 2 files 'my' and 'file.xml' unzip then tries to extract file my from A.zip which fails
Any ideas on how to force the for loop not to consider the space in the file name as a separator?
Use the -Z1
option of unzip
instead of -l
. It outputs one file per line with no additional information. You should read its output instead of loop over it with for to prevent word splitting. You might still have problems with filenames containing a newline (but I wasn't able to zip them, $'a\nb'
was stored as a^Jb
and extracted as ab
).
Also, your if
is missing a then
.
Also, don't parse the output of ls
, you can iterate over the globbed file mask itself.
You don't need to check that grep
outputs anything, just run it with -q
and check its exit status.
Don't forget to doublequote variables that might contain whitespace or other special characters.
for z in /path/to/archives/*.zip ; do
while IFS= read -r f ; do
if unzip -p "$z" "$f" | grep -q "$string" ; then
unzip "$z" "$f"
fi
done < <(unzip -Z1 "$z" '*.xml')
done