Search code examples
pythonpandasdataframeloopssklearn-pandas

I'm trying to predict probability of X_test and getting 2 values in an array. I need to compare those 2 values and make it 1


I'm trying to predict probability of X_test and getting 2 values in an array. I need to compare those 2 values and make it 1.

when I write code

y_pred = classifier.predict_proba(X_test)
y_pred

It gives output like

array([[0.5, 0.5],
       [0.6, 0.4],
       [0.7, 0.3],
       ...,
       [0.5, 0.5],
       [0.4, 0.6],
       [0.3, 0.7]])

We know that if values if >= 0.5 then it's and 1 and if it's less than 0.5 it's 0

I converted the above array into pandas using below code

proba = pd.DataFrame(proba)
proba.columns = [['pred_0', 'pred_1']]
proba.head()

And output is

    pred_0  pred_1
0   0.5     0.5
1   0.6     0.4
2   0.7     0.3
3   0.4     0.6
4   0.3     0.7

How to iterate the above rows and write a condition that if row value of column 1 is greater than equal to 0.5 with row value of 2, then it's 1 and if row value of column 1 is less than 0.5 when compared to row value of column 2.

For example, by seeing the above data frame the output must be

  output
0 0
1 1
2 1
3 1
4 1

Solution

  • You could just map your initial array without converting it to a Pandas Dataframe so that it returns True when the first value of every subarray is >= 0.5 and if not it returns False. And finally, convert it to int:

    >>> import numpy as np
    >>> a = np.array([[0.5, 0.5], [0.6, 0.4], [0.3, 0.7]])
    >>> a
    array([[0.5, 0.5],
           [0.6, 0.4],
           [0.3, 0.7]])
    >>> result = map(lambda x:int(x[0] >= 0.5), a)
    >>> print(list(result))
    [1, 1, 0]