Search code examples
pythonpython-3.xpandasdata-processing

DataFrame objects are mutable thus they cannot be hashed while using Series.unique()


I am having an issue while using Series.unique() in titanic dataframe.

While using the Series.unique() in the original df gives no error, but on concatenating train and tests based on specific columns, and then using Series.unique() gives me the error.

This according to what I have tried is being caused by replacing null values in the 5th statement. If I comment out that line, the code works without giving any error. Why is it so? And also is there any work around?

cat_cols = ['Pclass', 'Sex', 'Embarked']
df_train = pd.read_csv('train.csv')
df_pred = pd.read_csv('test.csv')
df_join = pd.concat([df_train[cat_cols], df_pred[cat_cols]])
df_join = df_join.fillna(df_join.mode, axis=0)
df_join.Embarked.unique()

The train and test files can be download from:

https://www.kaggle.com/c/titanic/download/test.csv https://www.kaggle.com/c/titanic/download/train.csv

I am currently using Pandas Version 0.23.4


Solution

  • Given:

    cat_cols = ['Pclass', 'Sex', 'Embarked']
    df_train = pd.read_csv('train.csv')
    df_pred = pd.read_csv('test.csv')
    df_join = pd.concat([df_train[cat_cols], df_pred[cat_cols]])
    

    NaN values occur only in Embarked column as can be verified from below code:

    df_join.info()
    
    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 1309 entries, 0 to 417
    Data columns (total 3 columns):
    Pclass      1309 non-null int64
    Sex         1309 non-null object
    Embarked    1307 non-null object
    dtypes: int64(1), object(2)
    memory usage: 80.9+ KB
    

    So, replacing the NaN with the mode of the Embarked column values:

    df_join.Embarked = df_join.Embarked.fillna(df_join.Embarked.mode()[0])
    df_join.Embarked.value_counts().sum()
    # 1309
    

    and looking for unique values:

    df_join.Embarked.unique()
    # array(['S', 'C', 'Q'], dtype=object)
    

    Tip: It's not mode but mode()[0]

    Hope I answered your query, if not comment down your query.