sklearn logloss parameter normalize function

A rather trivial question: What does the parameter "normalize" for sklearn's log_loss metric do?

According to the documentation: "normalize : bool, optional (default=True) If true, return the mean loss per sample. Otherwise, return the sum of the per-sample losses." My understanding is that it has do to with whether or not the N is included, True is average, False is sum: logloss = -1/N (sum of per case loss) log loss function

If so, optimizing one or the other does not make a difference, then, why do we prefer one over the other? In other words, what is the point of putting the parameter in place? Personal preference?

Solution

While the minimisation of f(x) and 1/N f(x) is equivalent, the meaning of constants change when you are dealing with functions of form f(x) + alpha g(x) vs. 1/N f(x) + alpha g(x), which happens when you are learning for example regularised logistic regression, thus in second case equivalent alpha is 1/N * previous alpha. There is no "one choice" here, it simply depends on the application - sometimes mean is better suited (when you need invariance to the sample size) and sometimes sum.