Search code examples
rmachine-learningregressionsubsetlogistic-regression

Build logistic regression models for each PAIR of 10 classes


I am working on MNIST digit recognizer dataset

Here, I have 10 class labels and I am looking to build and compare all the pairs of classes, ie, to run 10c2 logistic regression models and compare them. I know I could use combn(unique(mnist$label), 2, function(x) , simplify = TRUE) in a loop and write the model in function. But, Im stuck here.

loglist <- list()
for(i in unique(mnist$label)){ 
        tmp <- try(append(loglist, glm(label~.,family=binomial(link=logit),
                   data = mnist[mnist$label == i, ])))
        if (class(tmp) != "try-error") loglist <- append(loglist, tmp)
} 

Any help or suggestion would be of great help, Thank you.


Solution

  • There are 3 ways to use logistic regression models for multiple (10 in your case) classes.

    1. One vs rest
    2. One vs One

    These two methods have good explanation on wiki and have good video lecture by Andrew NG.

    Another Approach would be to use Softmax Regression, A good tutorial can be find at given link. This model generalizes logistic regression to classification problems where the class label y can take on more than two possible values.

    So, which model to use when :

    This will depend on whether the four classes are mutually exclusive. For example, if your four classes are classical, country, rock, and jazz, then assuming each of your training examples is labeled with exactly one of these four class labels, you should build a softmax classifier.

    If however your categories are has_vocals, dance, soundtrack, pop, then the classes are not mutually exclusive; for example, there can be a piece of pop music that comes from a soundtrack and in addition has vocals. In this case, it would be more appropriate to build 4 binary logistic regression classifiers. This way, for each new musical piece, your algorithm can separately decide whether it falls into each of the four categories.