machine-learning linear-regression gradient-descent

Represent Linear Regression features in Gradient Descent numerically

The following piece of python code works well for finding gradient descent:

def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y 
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        gradient = np.dot(xTrans, loss) / m 
        theta = theta - alpha * gradient
    return theta

Here, x = m*n (m = no. of sample data and n = total features) feature matrix.

However, if my features are non-numerical (say, director and genre) of '2' movies then my feature matrix may look like:

['Peter Jackson', 'Action'
 Sergio Leone', 'Comedy']

In such a case, how can I map these features to numerical values and apply gradient descent ?

Solution

You can map your features to numerical value of your choice and then apply gradient descent the usual way.

In python you can use panda to do that easily:

import pandas as pd
df = pd.DataFrame(X, ['director', 'genre'])
df.director = df.director.map({'Peter Jackson': 0, 'Sergio Leone': 1})
df.genre = df.genre.map({'Action': 0, 'Comedy': 1})

As you can see, this way can become pretty complicated and it might be better to write a piece of code doing that dynamically.