Search code examples
pythonarrayspandascsvquotation-marks

How to remove quote characters around array when loading a csv in Python?


I am having trouble removing quote characters that appear around my arrays. When I read in my file like this:

data = pd.read_csv('filepath.csv', sep='|', index_col=0, nrows=5)

the dtype of my problematic column is object but the individual entries are strings:

print(type(data.body_tokens[0]))
data.body_tokens[0]
<class 'str'>
"['he', 'knows', 'what', 'he', 's', 'doing']"

How can I remove the quotation marks around the array?


Solution

  • import ast
    
    string = "['he', 'knows', 'what', 'he', 's', 'doing']"
    
    list = ast.literal_eval(string)
    
    type(list)    #list
    
    print(list)   #['he', 'knows', 'what', 'he', 's', 'doing']
    

    want this one?