Search code examples
pythonexcelxlrd

Having trouble with lists and matrix using xlrd and python


So my problem is to take a couple of long vectors (each being 15000 rows long) with the appearence:

Origin     Destination     Distance

and the corresponding values in the columns. I want however to convert these, using Python and the xlrd package, into a distance matrix having

          Destination1     Destination2
Origin1   Distance11       Distance12
Origin2   Distance21       Distance22

and so forth.

What I have tried thus far is:

matrix ={}, i=0, list3 = [], list1 = []
for row in range(orksheet.nrows):
matrix[i] = {}
cell = worksheet.cell(row,2)
distance = cell.value
if float(distance) < 25000:
    list1 = [int(worksheet.cell_value(row,0))]
    list3 = list3.append(list1)
    list2 = [int(worksheet.cell_value(row,1))]
    for l in list1:
        for j in list2:
                matrix[l, j]=math.ceil(worksheet.cell_value(row,2))             
i+=1

This works somewhat. When I use print(l,j,matrix[l,j]

within the loop over l and j I get what I get the desired values. However, using print(matrix) gives the (general, i.e. the output like that but with the corresponding values instead) output:

(Origin, Destination): Distance and sometimes: distance: {}, distance: {}, 

and so on.

What I've perceived is the problem is with the matrix. I cannot understand why it prints like that which I believe has something to do with the lists? The list1 and list2 has len 1 which seems odd to me. I've tried to use list3 to append list1 but it also get len 1.

Regards,


Solution

  • I could not recommend pandas more for data-manipulation tasks.

    For example, the operation you seek in pandas is called pivot:

    In [11]: df = pd.DataFrame({'origin': list('aabbccdd'), 'destination': ['d1', 'd2'] * 4, 'distance': np.arange(8)})
    
    In [12]: df
    Out[12]: 
      destination  distance origin
    0          d1         0      a
    1          d2         1      a
    2          d1         2      b
    3          d2         3      b
    4          d1         4      c
    5          d2         5      c
    6          d1         6      d
    7          d2         7      d
    
    In [13]: df.pivot('origin', 'destination', 'distance')
    Out[13]: 
    destination  d1  d2
    origin             
    a             0   1
    b             2   3
    c             4   5
    d             6   7
    

    And to read actual excel file there's pandas.read_excel which AFAIR uses xlrd under the hood:

    df = read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
    

    And there's a lot more to find in the documentation