Search code examples
pythonmachine-learningscikit-learnlogistic-regression

Is there a way to build a logistic regression model even if there is only one class?


Is there a way to build a scikit-learn logistic regression model for only 1 class? Obviously this model would predict the same class every time, regardless of the input data. My models are currently using liblinear as the solver, I'm not sure if there's another solver that would allow for this?

I realize this is a very strange question for ML but I am building many hierarchical models and in my situation it is easier to have a model for every case even if it predicts the same class every time.

Background: I have a hierarchical prediction task where I'm trying to predict three parts of a 9 digit code (e.g. for a code = 001010424, part 1 = 001, part 2 = 01, part 3= 0424). To do this I'm building hierarchical models. Using the input data we first predict part 1, then using the highest confidence decision for part 1 we use the input data again in a model for part 2 that is specific to the part 1 code. So for example, if I run the part 1 model and get a prediction that part 1 = 001 I then go to the part 2 model for 001 which then (is trained on and) predicts part 2 given part 1 = 001. This hierarchical behavior is repeated for part 3.


Solution

  • Scitkit learn needs samples of at least two classes.

      import numpy as np
      from sklearn.linear_model import LogisticRegression
    
      x = np.random.rand(5,2)
      y = np.ones(5).astype(int)
      model = LogisticRegression().fit(x, y)
    

    This yields the error:

      ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1
    

    You are probably better off having your training algorithm check if there is only one y label, and if there is, just have your code memorize that. It seems that such an implementation would be straightforward to implement and much easier to understand for anyone looking at the code later.