Search code examples
pythonexcelurllib2xlrd

Process File in memory using python


I am reading some data files stored as excel from online. My current process involves downloading the file to disk using the retrieve function defined below which uses the urllib2 library and then parses the excel document using the traverseWorkbook function. The traverse function uses the xlrd library for parsing the excel.

I would like to perform the same operation without requiring downloading the file to disk but will prefer to keep the file in memory and parse it memory.

Not sure how to even proceed, but I'm sure its possible.

def retrieveFile(url, filename):
    try:
        req = urllib2.urlopen(url)
        CHUNK = 16 * 1024
        with open(filename, 'wb') as fp:
            while True:
                chunk = req.read(CHUNK)
                if not chunk: break
                    fp.write(chunk)
        return True
    except Exception, e:
        return None


def traverseWorkbook(filename):
    values = []

    wb = open_workbook(filename)
    for s in wb.sheets():
        for row in range(s.nrows):
           if row > 10:
               rowData = processRow(s, row, type)
               if rowData:
                   values.append(rowData)

Solution

  • You can read the entire file into memory using:

    data = urllib2.urlopen(url).read()
    

    Once the file is in memory, you can load it into xlrd using the file_contents argument of open_workbook:

    wb = xlrd.open_workbook(url, file_contents=data)
    

    Pass the url in as the filename as the documentation states it might be used in messages; otherwise, it will be ignored.

    Thus, your traverseWorbook method can be rewritten as:

    def traverseWorkbook(url):
        values = []
        data = urllib2.urlopen(url).read()
        wb = xlrd.open_workbook(filename, file_contents=data)
        for s in wb.sheets():
            for row in range(s.nrows):
            if row > 10:
                rowData = processRow(s, row, type)
                if rowData:
                    values.append(rowData)
        return values