Search code examples
pythonpandaswekaarff

Letter appeared in data when arff loaded into Python


I have loaded an arff file to python using this code:

import pandas as pd, scipy as sp
from scipy.io import arff
datos,meta = arff.loadarff(open('selectividad.arff', 'r'))
d = pd.DataFrame(datos)

When I use head function to see the data frame, this is how it looks: enter image description here

However, those 'b' are not present in the arff file as we can see below: https://gyazo.com/3123aa4c7007cb4d6f99241b1fc41bcb What is the problem here? Thank you very much


Solution

  • For one column, apply the following code:

    data['name_column'] = data['name_column'].str.decode('utf-8') 
    

    For a dataframe, apply:

    str_df = df.select_dtypes([object])
    str_df = str_df.stack().str.decode('utf-8').unstack()