Python2.7 search zipfiles for .kml containing string without unzipping

I am trying to write my first python script below. I want to search through a read only archive on an HPC to look in zipfiles contained within folders with a variety of other folder/file types. If the zip contains a .kml file I want to print the line in there starting with the string <coordinates>.

import zipfile as z 
kfile = file('*.kml') #####breaks here#####
folderpath = '/neodc/sentinel1a/data/IW/L1_GRD/h/IPF_v2/2015/01/21'  # folder with multiple folders and .zips
for zipfile in folderpath:  # am only interested in the .kml files within the .zips
    if kfile in zipfile:
        with read(kfile) as k:
            for line in k:
                if '<coordinates>' in line:  # only want the coordinate line
                    print line  # print the coordinates
k.close()

Eventually I want to loop this through multiple folders rather than pointing to the exact folder location ie loop thorough every sub folder in here /neodc/sentinel1a/data/IW/L1_GRD/h/IPF_v2/2015/ but this is a starting point for me to try and understand how python works.

I am sure there are many problems with this script before it will run but the current one I have is

kfile = file('*.kml')
IOError: [Errno 22] invalid mode ('r') or filename: '*.kml'
Process finished with exit code 1

Any help appreciated to get this simple process script working.

Solution

When you run:

kfile = file('*.kml')

You are trying to open a single file named exactly *.kml, which is not what you want. If you want to process all *.kml files, you will need to (a) get a list of matching files and then (b) process those files in a list.

There are a number of ways to accomplish the above; the easiest is probably the glob module, which can be used something like this:

import glob
for kfilename in glob.glob('*.kml'):
    print kfilename

However, if you are trying to process a directory tree, rather than a single directory, you may instead want to investigate the os.walk function. From the docs:

Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).

A simple example might look something like this:

import os
for root, dirs, files in os.walk('topdir/'):
    kfilenames = [fn for fn in files if fn.endswith('.kml')]
    for kfilename in kfilenames:
        print kfilename

Additional commentary

Iterating over strings

Your script has:

for zipfile in folderpath:

That will simply iterate over the characters in the string folderpath. E.g., the output of:

folderpath = '/neodc/sentinel1a/data/IW/L1_GRD/h/IPF_v2/2015/01/21'
for zipfile in folderpath:
    print zipefile

Would be:

/
n
e
o
d
c
/
s
e
n
t
i
n
e
l
1
a
/

...and so forth.

read is not a context manager

Your code has:

with read(kfile) as k:

There is no read built-in, and the .read method on files cannot be used as a context manager.

KML is XML

You're looking for "lines beginning with <coordinate>", but KML files are not line based. An entire KML could be a single line and it would still be valid.

Your are much better off using an XML parser to parse XML.