How to use gradient descent on data that has string values?

I want to solve the predicting house pricing problem (https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)

How could I transform string data into numerical data in Octave?

Solution

The link is paywalled, but its title mentions the word 'categorical', so I'm assuming that by 'numerical' you mean integer labels, rather than parsing a string that represents a number to its equivalent float.

So with that in mind, here's a typical way to represent this.

Indices = [ 1,2,3,2,3,2,1,2,1,2,3,1,3,3,1 ];
Labels  = { 'class1', 'class2', 'class3' };

It really is as simple as that. If you really want this to be a single 'variable', you can collect it into a struct:

MyCategoricalVariable = struct( 'indices', Indices, 'labels', Labels );

Obviously it depends how the data is provided to you in the first place. If you're given the strings instead of the labels, you can convert it to an indices/labels pair like so:

Data = { 'a', 'b', 'c', 'c', 'b', 'c', 'b', 'a', 'a', 'a', 'b' };
Labels = unique( Data );
[~, Indices] = ismember( Data, Labels )