Search code examples
machine-learninglogistic-regression

How to derive an objective function for a multi-class logistic regression classifier using 1-of-k encoding?


I get what this wiki page says(http://en.wikipedia.org/wiki/Multinomial_logistic_regression), but I don't know how to get the update rules for stochastic gradient descent. Sorry to ask this here(this is really just about machine learning theories instead of actual implementation). Could someone provide a solution with explanation? Thanks in advance!


Solution

  • I happened to write code to implent softmax, I refer most to the page http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression

    this is the code I wrote in matlab ,hope it will help

    function y = sigmoid_multi(weight,x,class_index)
    %%  weight feature_dim * class_num
    %%  x      feature_dim * 1
    %%  class_index  scalar
        sum = eps;
        class_num = size(weight,2);
        for i = 1:class_num
            sum = sum + exp(weight(:,i)'*x);
        end
        y = exp(weight(:,class_index)'*x)/sum;
    end
    
    function g = gradient(train_patterns,train_labels,weight)
        m = size(train_patterns,2);
        class_num = size(weight,2);
        g = zeros(size(weight));
        for j = 1:class_num
            for i = 1:m
                if(train_labels(i) == j)
                    g(:,j) = g(:,j) + (1 - log( sigmoid_multi(weight,train_patterns(:,i),j) + eps))*train_patterns(:,i);
                end
            end
        end
        g = -(g/m);
    end
    function J = object_function(train_patterns,train_labels,weight)
        m = size(train_patterns,2);
        J = 0;
        for i = 1:m
            J = J + log( sigmoid_multi(weight,train_patterns(:,i),train_labels(i)) + eps);
        end
        J = -(J/m);
    end
    
    function weight = multi_logistic_train(train_patterns,train_labels,alpha)
    %%  weight feature_dim * class_num
    %%  train_patterns  featur_dim * sample_num
    %%  train_labels  1 * sample_num
    %%  alpha   scalar
         class_num = length(unique(train_labels));
         m = size(train_patterns,2); %% sample_number;
         n = size(train_patterns,1); % feature_dim;
         weight = rand(n,class_num);
         for i = 1:40
            J = object_function(train_patterns,train_labels,weight);
            fprintf('objec function value : %f\n',J);
            weight = weight - alpha*gradient(train_patterns,train_labels,weight);
         end    
    end