I am trying to write a program which takes the cells from the first column of an excel file (a last name), and search for that string within the text of the cell adjacent to it in the same row.
Currently, my code reads as follows:
import xlrd
workbook = xlrd.open_workbook("C:\Python27\Doc\Book3.xls")
worksheet = workbook.sheet_by_name("Sheet1")
num_rows = worksheet.nrows - 1
num_cells = worksheet.ncols - 1
curr_row = -1
while curr_row < num_rows:
curr_row += 1
row = worksheet.row(curr_row)
curr_cell = 2
while curr_cell < num_cells:
curr_cell += 1
cell_value = worksheet.cell_value(curr_row, curr_cell)
sh = workbook.sheet_by_index(0)
first_col = sh.col_values(2)
second_col = sh.col_values(3)
L = [first_col]
L1 = [second_col]
for i, j in enumerate(L):
if j in L1[i]:
print j
else:
print 'no'
My code seems to "work" when I generate the lists by hand (i.e. just a test list of L = ['a', 'b', 'c'] and L1 = ['Today a cat a', 'Today b cat b'] etc, but when I attempt to use xlrd to create the lists all I get is a single "no" printout, which is very confusing. I assume this has something to do with either the way the lists are indexed or something else wonky with the size of the lists (16,000 names in column A, about 5,000,000 words of text in column B)
Any help/tips that can be offered would be very much appreciated. I have seen lots of approaches to similar tasks around the web (and on here), but I have no idea how to integrate the different approaches into something that would be effective for me.
Many thanks
Give it a try:
import xlrd
workbook = xlrd.open_workbook("input.xls")
worksheet = workbook.sheet_by_name("Sheet1")
for row in xrange(worksheet.nrows):
value_first = worksheet.cell_value(row, 0)
value_second = worksheet.cell_value(row, 1)
if value_first in value_second:
print row
else:
print 'no'