Search code examples
machine-learningclassificationmulticlass-classification

Is it possible to only use K-1 logits for K-class classification?


For multi-class classification, we use softmax function to calculate the probability.

In the case of case = 2, we have softmax(a)_0 = e^a_0/(e^a_0 + e^a_1) = 1/(1+e^(a_1 - a_0) = sigmoid(a_0 - a_1), which we reduce softmax to logistic, and we only use 1 logit.

I'm wondering if it's possible to only use K-1 logits to model the multi-class classification problem, when we have K class?


Solution

  • The question is essentially equiavalent to asking "is there a surjective (preferably bijective) function from R^{n-1} to n-simplex" and the answer is of course positive. Some examples:

    1. f([x1, ..., xn-1]) = softmax([x1, ..., xn-1, 0])
    2. f([x1, ..., xn-1]) = [sigmoid(x1), (1-sigmoid(x1)) * softmax([x2, ..., xn-1])]
    

    In general these will often introduce some arbitrary assymetry to your formulation which due to Okham's razor is something we usually avoid.

    Note, that

    softmax([-x, 0]) = [e^{-x}/(e^{-x} + e^0), 1/(e^{-x} + 1)] 
                     = [1-sigmoid(x), sigmoid(x)]
    

    So in a sense solution (1) is a generalisation of what you do with sigmoid in K=2 case to the K>2 case. Unfortunately you have to arbitrary pick which of the dimensions you wil substitute with 0.