Search code examples
pythonpandasdataframemachine-learningmetrics

Trying to find how similar one column is to another in dataframe


I am trying to calculate accuracy rate.

I have a pandas dataframe with numerous columns of data.

I have one column of predicted churns and one column of true churns for every customer.

Is there a way to calculate the accuracy metric and other metrics just between the two columns? Both columns are only binary of 0 as no churn and 1 as churn.


Solution

  • There is obviously many ways you can measure accuracy of a prediction against known answers. Since you tagged this with machine learning and python, I suggest using a confusion matrix (aka error matrix) as a first pass. The scikit-learn python library has a module that you can use:

    from sklearn.metrics import confusion_matrix
    y_true = ...
    y_pred = ...
    confusion_matrix( y_true, y_pred )
    

    source: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html