I would to calculate the cost for the softmax regression. The cost function to calculate is given at the bottom of the page.
For numpy I can get the cost as follows:
"""
X.shape = 2,300 # floats
y.shape = 300, # integers
W.shape = 2,3
b.shape = 3,1
"""
import numpy as np
np.random.seed(100)
# Data and labels
X = np.random.randn(300,2)
y = np.ones(300)
y[0:100] = 0
y[200:300] = 2
y = y.astype(np.int)
# weights and bias
W = np.random.randn(2,3)
b = np.random.randn(3)
N = X.shape[0]
scores = np.dot(X, W) + b
hyp = np.exp(scores-np.max(scores, axis=0, keepdims=True))
probs = hyp / np.sum(hyp, axis = 0)
logprobs = np.log(probs[range(N),y])
cost_data = -1/N * np.sum(logprobs)
print("hyp.shape = {}".format(hyp.shape)) # hyp.shape = (300, 3)
print(cost_data)
But, when I tried torch I could not get this. So far I have got this:
"""
X.shape = 2,300 # floats
y.shape = 300, # integers
W.shape = 2,3
b.shape = 3,1
"""
import numpy as np
import torch
from torch.autograd import Variable
np.random.seed(100)
# Data and labels
X = np.random.randn(300,2)
y = np.ones(300)
y[0:100] = 0
y[200:300] = 2
y = y.astype(np.int)
X = Variable(torch.from_numpy(X),requires_grad=True).type(torch.FloatTensor)
y = Variable(torch.from_numpy(y),requires_grad=True).type(torch.LongTensor)
# weights and bias
W = Variable(torch.randn(2,3),requires_grad=True)
b = Variable(torch.randn(3),requires_grad=True)
N = X.shape[0]
scores = torch.mm(X, W) + b
hyp = torch.exp(scores - torch.max(scores))
probs = hyp / torch.sum(hyp)
correct_probs = probs[range(N),y] # got problem HERE
# logprobs = np.log(correct_probs)
# cost_data = -1/N * torch.sum(logprobs)
# print(cost_data)
I got problem calculating the correct probabilities for the classes.
How can we solve this problem and get the correct cost value.
Your problem is that you cannot use range(N)
with pytorch
, use the slice 0:N
instead:
hyp = torch.exp(scores - torch.max(scores))
probs = hyp / torch.sum(hyp)
correct_probs = probs[0:N,y] # problem solved
logprobs = torch.log(correct_probs)
cost_data = -1/N * torch.sum(logprobs)
Another point is that your labels y
do not require gradients, you would better have:
y = Variable(torch.from_numpy(y),requires_grad=False).type(torch.LongTensor)