Search code examples
machine-learningoctave

How to use gradient descent on data that has string values?


I want to solve the predicting house pricing problem (https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)

How could I transform string data into numerical data in Octave?


Solution

  • The link is paywalled, but its title mentions the word 'categorical', so I'm assuming that by 'numerical' you mean integer labels, rather than parsing a string that represents a number to its equivalent float.

    So with that in mind, here's a typical way to represent this.

    Indices = [ 1,2,3,2,3,2,1,2,1,2,3,1,3,3,1 ];
    Labels  = { 'class1', 'class2', 'class3' };
    

    It really is as simple as that. If you really want this to be a single 'variable', you can collect it into a struct:

    MyCategoricalVariable = struct( 'indices', Indices, 'labels', Labels );
    

    Obviously it depends how the data is provided to you in the first place. If you're given the strings instead of the labels, you can convert it to an indices/labels pair like so:

    Data = { 'a', 'b', 'c', 'c', 'b', 'c', 'b', 'a', 'a', 'a', 'b' };
    Labels = unique( Data );
    [~, Indices] = ismember( Data, Labels )