Search code examples
machine-learninglogistic-regressionlosscross-entropymean-square-error

Comparing MSE loss and cross-entropy loss in terms of convergence


For a very simple classification problem where I have a target vector [0,0,0,....0] and a prediction vector [0,0.1,0.2,....1] would cross-entropy loss converge better/faster or would MSE loss? When I plot them it seems to me that MSE loss has a lower error margin. Why would that be? enter image description here

Or for example when I have the target as [1,1,1,1....1] I get the following: enter image description here


Solution

  • You sound a little confused...

    • Comparing the values of MSE & cross-entropy loss and saying that one is lower than the other is like comparing apples to oranges
    • MSE is for regression problems, while cross-entropy loss is for classification ones; these contexts are mutually exclusive, hence comparing the numerical values of their corresponding loss measures makes no sense
    • When your prediction vector is like [0,0.1,0.2,....1] (i.e. with non-integer components), as you say, the problem is a regression (and not a classification) one; in classification settings, we usually use one-hot encoded target vectors, where only one component is 1 and the rest are 0
    • A target vector of [1,1,1,1....1] could be the case either in a regression setting, or in a multi-label multi-class classification, i.e. where the output may belong to more than one class simultaneously

    On top of these, your plot choice, with the percentage (?) of predictions in the horizontal axis, is puzzling - I have never seen such plots in ML diagnostics, and I am not quite sure what exactly they represent or why they can be useful...

    If you like a detailed discussion of the cross-entropy loss & accuracy in classification settings, you may have a look at this answer of mine.