python dataframe machine-learning jupyter-notebook prediction

Extra column from nowhere in dataframe when trying to extract mode from series

finals_preds= pd.concat([clf_preds,clf_pred_probs,ISFOR_clus_preds,SVM_clus_preds,KMEANS_clus_preds,LOCOUT_clus_preds, DBSC_clus_preds],axis=1)
finals_preds.columns=['clf_class','clf_score', 'ISOFOR','SVM-1C','KMEANS','LOCOUT','DBSCAN']
finals_preds

Then this is the output

Then the real problem comes, when I tried to add another column to summarize the modes of the series, the error says I tried to jam 2 columns into 1.

# add a column for all the scrores
finals_preds['ENSEMB']= finals_preds[['ISOFOR','SVM-1C','KMEANS','LOCOUT']].mode(axis=1)
finals_preds

Error message:

ValueError: Wrong number of items passed 2, placement implies 1

Then I checked the right side of the code, which confused me:

I also printed out the result of each series' modes, they all look normal like this:

So why is there an extra column when I tried to do the modes from them together?

Solution

mode returns the values that appears most often. You have a binary table so you can have this three cases below:

     0    1
0  0.0  NaN  # You have more 0 than 1 in the first row
1  1.0  NaN  # You have more 1 than 0 in the second row
2  0.0  1.0  # You have as many 0 as 1 in the third row

Unless there is no equality between the number of 0's and 1's for each row in the whole dataframe, the output will always have 2 columns.

If you want the most representative value for each row, do:

finals_preds['ENSEMB']= \
    finals_preds[['ISOFOR','SVM-1C','KMEANS','LOCOUT']].mode(axis=1)[0]
#                                                           HERE ---^^^