python list dataframe classification multiclass-classification

How to build a multiclass list from a csv in Python?

I'm trying to build a list based on a dataframe like this one

TRAINING_DATA = [
   ["accepted",{"APP": True , "FEE": False, "THY": False}],
   ["change accepted",{"APP": True , "FEE": False, "THY": False}],
   ["yes i approve these changes",{"APP": True , "FEE": False, "THY": False}]
]

From Jupyter I can created it without problems. However, I need to build it from a csv file. Currently, I'm trying with this content:

text;class
"accepted"; {'APP': True , 'FEE': False, 'THY': False}
"change accepted";{'APP': True , 'FEE': False, 'THY': False}

And, in Python, I load the file using this command:

df = pd.read_csv("prueba.csv", usecols=['text','class'], delimiter=";")

But, as stated in the title, I need to build a list that take the class column as object and not as text. I created the list using this sentence:

newList = df.values.tolist()
newList

But, the result is not the expected:

[['accepted', " {'APP': True , 'FEE': False, 'THY': False}"],
['change accepted', "{'APP': True , 'FEE': False, 'THY': False}"]]

As can be seen, the second "column" of the list is converted into String. What I need is this (without the "):

[['accepted', {'APP': True , 'FEE': False, 'THY': False}],
['change accepted', {'APP': True , 'FEE': False, 'THY': False}]]

It is important to mention that I already performed the following sentences:

df['class'] = df['class'].astype(object)
df['class'] = df['class'].astype('category')

But without any success.

What I need to know is how should be written the csv file, and also, what treatment should be done over the dataframe in order to achieve this task?

Solution

What you need to do is convert the string which contains the dictionary to a dictionary and this can be done using ast.literal_eval().

In the list comprehension data_ you loop over the array of then using ast.literal_eval on the 2nd index which is the string "{'APP': True , 'FEE': False, 'THY': False}" then convert it to a dictionary. After that create a new array and put the first element which is the string accepted first and then the new dictionary in the second position.

Note: This will only work if there are no unnecessary spaces in the string with the dictionary in it. So make sure to remove spaces like in the front of this string " {'APP': True , 'FEE': False, 'THY': False}" and change it to "{'APP': True , 'FEE': False, 'THY': False}" like it did.

import ast

data = [
    ["accepted", "{'APP': True , 'FEE': False, 'THY': False}"],
    ["change accepted", "{'APP': True , 'FEE': False, 'THY': False}"],
]

data_ = [[d[0], ast.literal_eval(d[1])] for d in data]
print(data_)

Output:

[['accepted', {'APP': True, 'FEE': False, 'THY': False}], ['change accepted', {'APP': True, 'FEE': False, 'THY': False}]]