Search code examples
machine-learningtime-complexitylogistic-regression

What is the Search/Prediction Time Complexity of Logistic Regression?


I am looking into the time complexities of Machine Learning Algorithms and I cannot find what is the time complexity of Logistic Regression for predicting a new input. I have read that for Classification is O(c*d) c-beeing the number of classes, d-beeing the number of dimensions and I know that for the Linear Regression the search/prediction time complexity is O(d). Could you maybe explain what is the search/predict time complexity of Logistic Regression? Thank you in advance

Example For The other Machine Learning Problems: https://www.thekerneltrip.com/machine/learning/computational-complexity-learning-algorithms/


Solution

  • Complexity of training for logistic regression methods with gradient based optimization: O((f+1)csE), where:

    • f - number of features (+1 because of bias). Multiplication of each feature times it's weight (f operations, +1 for bias). Another f + 1 operations for summing all of them (obtaining prediction). Using gradient method to improve weights counts for the same number of operations, so in total we get 4* (f+1) (two for forward pass, two for backward), which is simply O(f+1).
    • c - number of classes (possible outputs) in your logistic regression. For binary classification it's one, so this term cancels out. Each class has it's corresponding set of weights.
    • s - number of samples in your dataset, this one is quite intuitive I think.
    • E - number of epochs you are willing to run the gradient descent (whole passes through dataset)

    Note: this complexity can change based on things like regularization (another c operations), but the idea standing behind it goes like this.

    Complexity of predictions for one sample: O((f+1)c)

    • f + 1 - you simply multiply each weight by the value of feature, add bias and sum all of it together in the end.
    • c - you do it for every class, 1 for binary predictions.

    Complexity of predictions for many samples: O((f+1)cs)

    • (f+1)c - see complexity for one sample
    • s - number of samples

    Difference between logistic and linear regression in terms of complexity: activation function.

    For multiclass logistic regression it will be softmax, while linear regression, as the name suggests, has linear activation (effectively no activation). It does not change the complexity using big O notation, but it's another c*f operations during the training (didn't want to clutter the picture further) multiplied by 2 for backprop.