Search code examples
pythonregexlistdictionarynaivebayes

Dictionary List Manipulation for ML Training in Python


I'm reading an excel file using this code.

from xlrd import open_workbook

book = open_workbook('excel_demo.xlsx')
sheet = book.sheet_by_index(0)

# read header values into the list    
keys = [sheet.cell(0, col_index).value for col_index in  xrange(sheet.ncols)]

dict_list = []
for row_index in xrange(1, sheet.nrows):
  d = {keys[col_index]: sheet.cell(row_index, col_index).value 
     for col_index in xrange(sheet.ncols)}
  dict_list.append(d)

print dict_list

Output that I get is in the form of dictionary list as shown below:

  [{'A': 1.0, 'C': 3.0, 'B': 2.0}, 
  {'A': 5.0, 'C': 7.0, 'B': 6.0}]

In my case, I would need to pass this list to my Naive Bayes algorithm as a training set. So I would need something as below:

train_data = [({"a": 4, "b": 1, "c": 0}, "1:0"),
({"a": 5, "b": 2, "c": 1}, "2:1"),
({"a": 0, "b": 3, "c": 4}, "3:4"),
({"a": 5, "b": 1, "c": 1}, "1:1"),
({"a": 1, "b": 4, "c": 3}, "4:3")]

How do I achieve this conversion in python code. Will regex help in this case. Many Thanks.


Solution

  • Let l be your original Excel data

    t = [(r, "".join((str(r['B']),":",str(r['C'])))) for r in l]
    

    t will give you the output you describe.