Logistic Regression Usecase

Problem: Given a list of movies watched by a specific user, calculate the probability that he will watch any specific movie.

Approach: It seems that it's a typical logistic regression use case (please correct me if I'm wrong)

Initial Logistic Regression code (please correct if something is wrong):

def sigmoid(x):
    return (1/(1+math.exp(-x)))

def gradientDescentLogistic(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = sigmoid(np.dot(x, theta))
        loss = hypothesis - y 
        # The ONLY difference between linear and logistic is the definition of hypothesis
        gradient = np.dot(xTrans, loss) / m 
        theta = theta - alpha * gradient
    return theta

Now the parameters here can be different actors, different genres, etc.. I'm unable to figure out how to fit these kind of parameters in the above code

Solution

Why it is not a use case of LR?

I would say that this is not a typical use case of Logistic Regression. Why? Because you only know what someone watched, you only have positive samples, you do not know what someone did not watch by decision. Obviously if I watched movies {m1,m2,m3} then I did not watch M\{m1,m2,m3} where M is the set of all movies in the history of mankind. But this is not a good assumption. I did not watch most of them because I do not own them, do not know about them or simply did not yet have time for this. In such case you can only model this as a one-class problem or a kind of density estimation (I do assume you do not have access to any other knowledge then the list of movies seen, so we cannot for example do collaborative filtering or other - crowd based - analysis).

Why not generate negative samples by hand?

Obviously you could for example select randomly movies from some database which were not seen by user and assume that (s)he does not like to see it. But this is just an arbitrary, abstract assumption, and your model will be extremely biased towards this procedure. For example, if you would take all unseen movies as negative samples, then a correct model would simply learn to say "Yes" only for training set, and "No" for the rest. If you randomly sample m movies, it will only learn to distinguish your taste from these m movies. But they can represent anything! In particular - movies that one woul love to see. To sum up - you can do this, to be honest it could even work in some particular applications; but from probabilistic perspective this is not a valid approach as you build in unjustifiable assumptions to the model.

How could I approach this?

So what can you do to do it in a probabilistic manner? You can, for example represent your movies as numerical features (some characteristics), and consequently have a cloud of points in some space R^d (where d is number of features extracted). Then, you can fit any distribution, such as Gaussian distribution (radial with d is big), GMM, or any other. This will give you a clear (easy to understand and "defend") model for P(user will watch|x).