Search code examples
pythonparsingunicodexlrd

Parsing unicode string read from a cell in an xlrd.Book object


I am trying to parse some unicode text from an excel2007 cell read by using xlrd (actually xlsxrd).
For some reason xlrd attaches "text: " to the beginning of the unicode string and is making it difficult for me to type cast. I eventually want to reverse the order of the string since it is a name and will be put in alphabetical order with several others. Any help would be greatly appreciated, thanks.

here is a simple example of what I'm trying to do:

>>> import xlrd, xlsxrd
>>> book = xlsxrd.open_workbook('C:\\fileDir\\fileName.xlsx')
>>> book.sheet_names()
[u'Sheet1', u'Sheet2']
>>> sh = book.sheet_by_index(1)
>>> print sh
<xlrd.sheet.Sheet object at 0x(hexaddress)>
>>> name = sh.cell(0, 0)
>>> print name
text: u'First Last'

from here I would like to parse "name" exchanging 'First' with 'Last' or just separating the two for storage in two different vars but every attempt I have made to type cast the unicode gives an error. perhaps I am going about it the wrong way? Thanks in advance!


Solution

  • I think you may need

    name = sh.cell(0,0).value
    

    to get the unicode object. Then, to split into two variables, you can obtain a list with the first and last name, using an empty space as separator:

    split_name = name.split(' ')
    print split_name
    

    This gives [u'First', u'Last']. You can easily reverse the list:

    split_name = split_name.reverse()
    print split_name
    

    giving [u'Last', u'First'].