Search code examples
pythonexceldictionaryxlrd

Storing the columns of a spreadsheet in a Python dictionary


I have a table stored in an Excel file as follows:

Species     Garden Hedgerow Parkland Pasture Woodland
Blackbird       47       10      40        2        2
Chaffinch       19        3       5        0        2
Great Tit       50        0      10        7        0
House Sparrow   46       16       8        4        0
Robin            9        3       0        0        2
Song Thrush      4        0       6        0        0

I am using the xlrd Python library for reading these data. I have no problem reading it into a list of lists (with each line of the table stored as a list), using the code below:

from xlrd import open_workbook
wb = open_workbook("Sample.xls")
headers = []
sdata = []
for s in wb.sheets():
    print "Sheet:",s.name
    if s.name.capitalize() == "Data":
        for row in range(s.nrows):
            values = []
            for col in range(s.ncols):
                data = s.cell(row,col).value
                if row == 0:
                    headers.append(data)
                else:   
                    values.append(data)
            sdata.append(values)

As is probably obvious, headers is a simple list storing the column headers and sdata contains the table data, stored as a list of lists. Here is what they look:

headers:

[u'Species', u'Garden', u'Hedgerow', u'Parkland', u'Pasture', u'Woodland']

sdata:

[[u'Blackbird', 47.0, 10.0, 40.0, 2.0, 2.0], [u'Chaffinch', 19.0, 3.0, 5.0, 0.0, 2.0], [u'Great Tit', 50.0, 0.0, 10.0, 7.0, 0.0], [u'House Sparrow', 46.0, 16.0, 8.0, 4.0, 0.0], [u'Robin', 9.0, 3.0, 0.0, 0.0, 2.0], [u'Song Thrush', 4.0, 0.0, 6.0, 0.0, 0.0]]

But I want to store these data into a Python dictionary, with each column as the key for a list containing all values for each column. For example (only part of the data is shown to save space):

dict = {
    'Species': ['Blackbird','Chaffinch','Great Tit'], 
    'Garden': [47,19,50], 
    'Hedgerow': [10,3,0], 
    'Parkland': [40,5,10], 
    'Pasture': [2,0,7], 
    'Woodland': [2,2,0]
} 

So, my question is: how can I achieve this? I know I could read the data by columns instead of by rows as in the code snippet above, but I could not figure out how to store the columns in a dictionary.

Thanks in advance for any assistance you can provide.


Solution

  • Once you have the columns, it's fairly easy:

    dict(zip(headers, sdata))
    

    Actually, it looks like sdata in your example may be the row data, even so, that's still fairly easy, you can transpose the table with zip as well:

    dict(zip(headers, zip(*sdata)))
    

    One of these two is what you are asking for.