Search code examples
regressionmatlabgradient-descent

Gradient descent doesn't converge


Here is my own implementation of gradient descent algorithm in matlab language

 m = height(data_training); % number of samples
cols = {'x1', 'x2', 'x3', 'x4', 'x5', 'x6',...
    'x7', 'x8','x9', 'x10', 'x11', 'x12', 'x13', 'x14', 'x15'}; 

y = data_training{:, {'y'}}';
X = [ones(m,1) data_training{:,cols}]'; 

theta = zeros(1,width(data_training));

alpha = 1e-2; % learning rate
iter = 400;

dJ = zeros(1,width(data_training));

J_seq = zeros(1, iter);

for n = 1:iter

    err = (theta*X - y);

    for j = 1:width(data_training)
        dJ(j) = 1/m*sum(err*X(j,:)');
    end

    J = 1/(2*m)*sum((theta*X-y).^2);

    theta = theta - alpha.*dJ;

    J_seq(n) = J;

    if mod(n,100) == 0
        plot(1:iter, J_seq);
    end
end

EDIT WORKING ALGORITHM

I have applied this algorithm to the following training dataset. The last column is the output variable. Here we have 15 different features.

For a reason for me unknown, when I plot the cost function J after 50 iterations in order to check if it is going towards the convergence, I see it doesn't convergence. Can you help me to understand? is it the implementation wrong or should I make something?

36    27    71     8.1    3.34    11.4    81.5    3243     8.8    42.6    11.7     21     15     59    59     921.87
35    23    72    11.1    3.14      11    78.8    4281     3.6    50.7    14.4      8     10     39    57     997.88
44    29    74    10.4    3.21     9.8    81.6    4260     0.8    39.4    12.4      6      6     33    54     962.35
47    45    79     6.5    3.41    11.1    77.5    3125    27.1    50.2    20.6     18      8     24    56     982.29
43    35    77     7.6    3.44     9.6    84.6    6441    24.4    43.7    14.3     43     38    206    55     1071.3
53    45    80     7.7    3.45    10.2    66.8    3325    38.5    43.1    25.5     30     32     72    54     1030.4
43    30    74    10.9    3.23    12.1    83.9    4679     3.5    49.2    11.3     21     32     62    56      934.7
45    30    73     9.3    3.29    10.6      86    2140     5.3    40.4    10.5      6      4      4    56     899.53
36    24    70       9    3.31    10.5    83.2    6582     8.1    42.5    12.6     18     12     37    61     1001.9
36    27    72     9.5    3.36    10.7    79.3    4213     6.7      41    13.2     12      7     20    59     912.35
52    42    79     7.7    3.39     9.6    69.2    2302    22.2    41.3    24.2     18      8     27    56     1017.6
33    26    76     8.6     3.2    10.9    83.4    6122    16.3    44.9    10.7     88     63    278    58     1024.9
40    34    77     9.2    3.21    10.2      77    4101      13    45.7    15.1     26     26    146    57     970.47
35    28    71     8.8    3.29    11.1    86.3    3042    14.7    44.6    11.4     31     21     64    60     985.95
37    31    75       8    3.26    11.9    78.4    4259    13.1    49.6    13.9     23      9     15    58     958.84
35    46    85     7.1    3.22    11.8    79.9    1441    14.8    51.2    16.1      1      1      1    54      860.1
36    30    75     7.5    3.35    11.4    81.9    4029    12.4      44      12      6      4     16    58     936.23
15    30    73     8.2    3.15    12.2    84.2    4824     4.7    53.1    12.7     17      8     28    38     871.77
31    27    74     7.2    3.44    10.8      87    4834    15.8    43.5    13.6     52     35    124    59     959.22
30    24    72     6.5    3.53    10.8    79.5    3694    13.1    33.8    12.4     11      4     11    61     941.18
31    45    85     7.3    3.22    11.4    80.7    1844    11.5    48.1    18.5      1      1      1    53     891.71
31    24    72       9    3.37    10.9    82.8    3226     5.1    45.2    12.3      5      3     10    61     871.34
42    40    77     6.1    3.45    10.4    71.8    2269    22.7    41.4    19.5      8      3      5    53     971.12
43    27    72       9    3.25    11.5    87.1    2909     7.2    51.6     9.5      7      3     10    56     887.47
46    55    84     5.6    3.35    11.4    79.7    2647      21    46.9    17.9      6      5      1    59     952.53
39    29    76     8.7    3.23    11.4    78.6    4412    15.6    46.6    13.2     13      7     33    60     968.66
35    31    81     9.2     3.1      12    78.3    3262    12.6    48.6    13.9      7      4      4    55     919.73
43    32    74    10.1    3.38     9.5    79.2    3214     2.9    43.7      12     11      7     32    54     844.05
11    53    68     9.2    2.99    12.1    90.6    4700     7.8    48.9    12.3    648    319    130    47     861.83
30    35    71     8.3    3.37     9.9    77.4    4474    13.1    42.6    17.7     38     37    193    57     989.26
50    42    82     7.3    3.49    10.4    72.5    3497    36.7    43.3    26.4     15     10     34    59     1006.5
60    67    82      10    2.98    11.5    88.6    4657    13.6    47.3    22.4      3      1      1    60     861.44
30    20    69     8.8    3.26    11.1    85.4    2934     5.8      44     9.4     33     23    125    64     929.15
25    12    73     9.2    3.28    12.1    83.1    2095       2    51.9     9.8     20     11     26    50     857.62
45    40    80     8.3    3.32    10.1    70.3    2682      21    46.1    24.1     17     14     78    56     961.01
46    30    72    10.2    3.16    11.3    83.2    3327     8.8    45.3    12.2      4      3      8    58     923.23

Solution

  • Not sure I'm following your logic, but it's quite obvious that 'e' (the error) should not be squared.

    Let's see what you should be using.

    theta is a column vector of unknowns, y is a column vector of measurements and X is the model matrix where each row is an 'example'. So you need to find theta such that:

    y = X*theta 
    

    Or equivalently, use an optimization method to find theta minimizing the current squared error (this is what makes this a convex optimization problem):

    e[n] = (y - X*theta[n])
    
    e[n]^2 --> minimize 
    

    Gradient descent uses the gradient of the error function (with respect to theta) to update the theta vector:

    theta[n+1] = theta[n] - alpha*2*X'*e[n]
    

    (Note that e[n] and theta[n] are vectors. This is math notation - not matlab's)

    So you see that e[n] is not squared in the update equation.