Search code examples
parsingcsvxlrd

python xlrd: convert xls to csv using tempfiles. Tempfile is empty


I am downloading an xls file from the internet. It is in .xls format but I need 'Sheet1' to be in csv format. I use xlrd to make the conversion but seem to have run into an issue where the file I write to is empty?

import urllib2
import tempfile
import csv
import xlrd

url_2_fetch = ____
u = urllib2.urlopen(url_2_fetch)
wb = xlrd.open_workbook(file_contents=u.read())
sh = wb.sheet_by_name('Sheet1')
csv_temp_file = tempfile.TemporaryFile()
with open('csv_temp_file', 'wb') as f:
    writer = csv.writer(f)
    for rownum in xrange(sh.nrows):
        writer.writerow(sh.row_values(rownum))

That seemed to have worked. But now I want to inspect the values by doing the following:

with open('csv_temp_file', 'rb') as z:
    reader = csv.reader(z)
    for row in reader:
        print row

But I get nothing:

>>> with open('csv_temp_file', 'rb') as z:
...     reader = csv.reader(z)
...     for row in reader:
...             print row
...
>>>

I am using a tempfile because I want to do more parsing of the content and then use SQLAlchemy to store the contents of the csv post more parsing to a mySQL database.

I appreciate the help. Thank you.


Solution

  • This is completely wrong:

    csv_temp_file = tempfile.TemporaryFile()
    with open('csv_temp_file', 'wb') as f:
        writer = csv.writer(f)
    

    The tempfile.TemporaryFile() call returns "a file-like object that can be used as a temporary storage area. The file will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected)."

    So your variable csv_temp_file contains a file object, already open, that you can read and write to, and will be deleted as soon as you call .close() on it, overwrite the variable, or cleanly exit the program.

    So far so good. But then you proceed to open another file with open('csv_temp_file', 'wb') that is not a temporary file, is created in the script's current directory with the fixed name 'csv_temp_file', is overwritten every time this script is run, can cause security holes, strange bugs and race conditions, and is not related to the variable csv_temp_file in any way.

    You should trash the with open statement and use the csv_temp_file variable you already have. You can try to .seek(0) on it before using it again with the csv reader, it should work. Call .close() on it when you are done with it and the temporary file will be deleted.