Search code examples
pythonmatplotlibplotscikit-learnlogistic-regression

Plotting an implicit function on top of scatter plots (decision boundary in logistic regression)


I am doing a logistic regression to separate data into two parts in Python. There are 28 features that are derived from 2 original features which were then used to derive others up to 6th powers between them (e.g. x_0^1x_1^5, x_0^6 etc.) The problem is, unlike when the boundary is a line, I could not find how to plot a non-linear boundary on top of scatter plots.

My attempt was to solve the boundary equation at each x using scipy.optimize, but the result was highly unsatisfactory:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.optimize
from functools import partial

plt.scatter(data['X1'][data['Y'] == 0], data['X2'][data['Y'] == 0], c='r', marker='o')  # data points where Y==0
plt.scatter(data['X1'][data['Y'] == 1], data['X2'][data['Y'] == 1], c='b', marker='+')  # data points where Y==1
def bnd_func(x, y):  # boundary function using the 'th' parameter vector
    return th[0] + th[1]*y + th[2]*y**2 + th[3]*y**3 + th[4]*y**4 + th[5]*y**5 + th[6]*y**6 + th[7]*x + th[8]*x*y + th[9]*x*y**2 + th[10]*x*y**3 + th[11]*x*y**4 + th[12]*x*y**5 + th[13]*x**2 + th[14]*x**2*y + th[15]*x**2*y**2 + th[16]*x**2*y**3 + th[17]*x**2*y**4 + th[18]*x**3 + th[19]*x**3*y + th[20]*x**3*y**2 + th[21]*x**3*y**3 + th[22]*x**4 + th[23]*x**4*y + th[24]*x**4*y**2 + th[25]*x**5 + th[26]*x**5*y + th[27]*x**6
xs = np.linspace(-1, 1.25, 200)
xa = []; ya= [];
for x in xs:
    try:
        y = scipy.optimize.newton(partial(bnd_func, x), 0, maxiter=10000, tol=10**(-5))
    except ValueError:
        pass
    else:
        xa.append(x)
        ya.append(y)
plt.plot(xa, ya)
plt.show()

enter image description here

The boundary is missing the top side; maybe I can change the initial value, but this is indeed an inelegant solution. I have also tried using sympy, but I was not able to overlap the scatter plots on top of it. Is there any way to achieve this? I do not mind using other packages if necessary.

Another question is, how can this be achieved if I have used sklearn.linear_model instead? I know how to retrieve the coefficients, but I am still not sure about drawing the boundary with the original scatter plot.


Solution

  • I solved the problem by using contour plot instead:

    # Fitting
    plt.scatter(data['X1'][data['Y'] == 0], data['X2'][data['Y'] == 0], c='r', marker='o')
    plt.scatter(data['X1'][data['Y'] == 1], data['X2'][data['Y'] == 1], c='b', marker='+')
    plt.axis('scaled')
    def bnd_func(x, y):
        return th[0] + th[1]*y + th[2]*y**2 + th[3]*y**3 + th[4]*y**4 + th[5]*y**5 + th[6]*y**6 + th[7]*x + th[8]*x*y + th[9]*x*y**2 + th[10]*x*y**3 + th[11]*x*y**4 + th[12]*x*y**5 + th[13]*x**2 + th[14]*x**2*y + th[15]*x**2*y**2 + th[16]*x**2*y**3 + th[17]*x**2*y**4 + th[18]*x**3 + th[19]*x**3*y + th[20]*x**3*y**2 + th[21]*x**3*y**3 + th[22]*x**4 + th[23]*x**4*y + th[24]*x**4*y**2 + th[25]*x**5 + th[26]*x**5*y + th[27]*x**6
    data_min = data.min().values
    data_max = data.max().values
    xax = np.arange(data_min[0], data_max[0], 0.05)
    yax = np.arange(data_min[1], data_max[1], 0.05)
    x_grid, y_grid = np.meshgrid(xax, yax)
    zax = bnd_func(x_grid, y_grid)
    z_grid = zax.reshape(x_grid.shape)
    plt.contour(x_grid, y_grid, z_grid, levels = [0])
    plt.show()
    

    enter image description here