Search code examples
pythondataframemachine-learningjupyter-notebookprediction

Extra column from nowhere in dataframe when trying to extract mode from series


finals_preds= pd.concat([clf_preds,clf_pred_probs,ISFOR_clus_preds,SVM_clus_preds,KMEANS_clus_preds,LOCOUT_clus_preds, DBSC_clus_preds],axis=1)
finals_preds.columns=['clf_class','clf_score', 'ISOFOR','SVM-1C','KMEANS','LOCOUT','DBSCAN']
finals_preds

Then this is the output

first output

Then the real problem comes, when I tried to add another column to summarize the modes of the series, the error says I tried to jam 2 columns into 1.

# add a column for all the scrores
finals_preds['ENSEMB']= finals_preds[['ISOFOR','SVM-1C','KMEANS','LOCOUT']].mode(axis=1)
finals_preds

Error message:

ValueError: Wrong number of items passed 2, placement implies 1

Then I checked the right side of the code, which confused me:

second one

I also printed out the result of each series' modes, they all look normal like this:

normal series modes

So why is there an extra column when I tried to do the modes from them together?


Solution

  • mode returns the values that appears most often. You have a binary table so you can have this three cases below:

         0    1
    0  0.0  NaN  # You have more 0 than 1 in the first row
    1  1.0  NaN  # You have more 1 than 0 in the second row
    2  0.0  1.0  # You have as many 0 as 1 in the third row
    

    Unless there is no equality between the number of 0's and 1's for each row in the whole dataframe, the output will always have 2 columns.

    If you want the most representative value for each row, do:

    finals_preds['ENSEMB']= \
        finals_preds[['ISOFOR','SVM-1C','KMEANS','LOCOUT']].mode(axis=1)[0]
    #                                                           HERE ---^^^