Search code examples
pythonscikit-learnlogistic-regressionpolynomial-math

How to implement polynomial logistic regression in scikit-learn?


I'm trying to create a non-linear logistic regression, i.e. polynomial logistic regression using scikit-learn. But I couldn't find how I can define a degree of polynomial. Did anybody try it? Thanks a lot!


Solution

  • For this you will need to proceed in two steps. Let us assume you are using the iris dataset (so you have a reproducible example):

    from sklearn.datasets import load_iris
    from sklearn.linear_model import LogisticRegression
    from sklearn.preprocessing import PolynomialFeatures
    from sklearn.model_selection import train_test_split
    from sklearn.pipeline import Pipeline
    
    data = load_iris()
    X = data.data
    y = data.target
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    

    Step 1

    First you need to convert your data to polynomial features. Originally, our data has 4 columns:

    X_train.shape
    >>> (112,4)
    

    You can create the polynomial features with scikit learn (here it is for degree 2):

    poly = PolynomialFeatures(degree = 2, interaction_only=False, include_bias=False)
    X_poly = poly.fit_transform(X_train)
    X_poly.shape
    >>> (112,14)
    

    We know have 14 features (the original 4, their square, and the 6 crossed combinations)

    Step 2

    On this you can now build your logistic regression calling X_poly

    lr = LogisticRegression()
    lr.fit(X_poly,y_train)
    

    Note: if you then want to evaluate your model on the test data, you also need to follow these 2 steps and do:

    lr.score(poly.transform(X_test), y_test)
    

    Putting everything together in a Pipeline (optional)

    You may want to use a Pipeline instead that processes these two steps in one object to avoid building intermediary objects:

    pipe = Pipeline([('polynomial_features',poly), ('logistic_regression',lr)])
    pipe.fit(X_train, y_train)
    pipe.score(X_test, y_test)