I want to compute the log-likelihood of a logistic regression model.
def sigma(x):
return 1 / (1 + np.exp(-x))
def logll(y, X, w):
""""
Parameters
y : ndarray of shape (N,)
Binary labels (either 0 or 1).
X : ndarray of shape (N,D)
Design matrix.
w : ndarray of shape (D,)
Weight vector.
"""
p = sigma(X @ w)
y_1 = y @ np.log(p)
y_0 = (1 - y) @ (1 - np.log(1 - p))
return y_1 + y_0
logll(y, Xz, np.linspace(-5,5,D))
Applying this function results in
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:16:
RuntimeWarning: divide by zero encountered in log
app.launch_new_instance()
I would expect y_0 to be a negative float. How can I avoid this error and is there a bug somewhere in the code?
Edit 1
X @ w statistics:
Max: 550.775133944
Min: -141.972597608
Sigma(max): 1.0 => Throws error in y_0 in np.log(1 - 1.0)
Sigma(min): 2.19828642169e-62
Edit 2
I also have access to this logsigma function that computes sigma in log space:
def logsigma (x):
return np.vectorize(np.log)(sigma(x))
Unfortunately, I don't find a way to rewrite y_0 then. The following is my approach but obviously not correct.
def l(y, X, w):
y_1 = np.dot(y, logsigma(X @ w))
y_0 = (1 - y) @ (1 - np.log(1 - logsigma(X @ w)))
return y_1 + y_0
First of all, I think you've made a mistake in your log-likelihood formula: it should be a plain sum of y_0
and y_1
, not sum of exponentials:
Division by zero can be caused by large negative values (I mean large by abs value) in X @ w
, e.g. sigma(-800)
is exactly 0.0
on my machine, so the log of it results in "RuntimeWarning: divide by zero encountered in log"
.
Make sure you initialize your network with small values near zero and you don't have exploding gradients after several iterations of backprop.
By the way, here's the code I use for cross-entropy loss, which works also in multi-class problems:
def softmax_loss(x, y):
"""
- x: Input data, of shape (N, C) where x[i, j] is the score for the jth class
for the ith input.
- y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
0 <= y[i] < C
"""
probs = np.exp(x - np.max(x, axis=1, keepdims=True))
probs /= np.sum(probs, axis=1, keepdims=True)
N = x.shape[0]
return -np.sum(np.log(probs[np.arange(N), y])) / N
UPD: When nothing else helps, there is one more numerical trick (discussed in the comments): compute log(p+epsilon)
and log(1-p+epsilon)
with a small positive epsilon
value. This ensures that log(0.0)
never happens.