Search code examples
machine-learninglinear-regressiongradient-descent

Represent Linear Regression features in Gradient Descent numerically


The following piece of python code works well for finding gradient descent:

def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y 
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        gradient = np.dot(xTrans, loss) / m 
        theta = theta - alpha * gradient
    return theta

Here, x = m*n (m = no. of sample data and n = total features) feature matrix.

However, if my features are non-numerical (say, director and genre) of '2' movies then my feature matrix may look like:

['Peter Jackson', 'Action'
 Sergio Leone', 'Comedy']

In such a case, how can I map these features to numerical values and apply gradient descent ?


Solution

  • You can map your features to numerical value of your choice and then apply gradient descent the usual way.

    In python you can use panda to do that easily:

    import pandas as pd
    df = pd.DataFrame(X, ['director', 'genre'])
    df.director = df.director.map({'Peter Jackson': 0, 'Sergio Leone': 1})
    df.genre = df.genre.map({'Action': 0, 'Comedy': 1})
    

    As you can see, this way can become pretty complicated and it might be better to write a piece of code doing that dynamically.