Search code examples
pythonnumpyoptimizationscipylogistic-regression

Scipy fmin_tnc not optimizing cost function


Objective: Is to optimize the cost function using scipy fmin_tnc optimizer.

Issue: Though the cost and gradient functions behave as expected when executed alone, when optimized by fmin_tnc same initial parameters - zeros array is returned.

See code below:

def sigmoid(z_vec):
    return 1/(1 + np.exp(-z_vec))

def hypothesis(X_vec, weights_vec):
    _hypothesis = np.vectorize(sigmoid)
    return _hypothesis(X_vec.dot(weights_vec.T))

def cost(X_vec, weight_vec, labels):
    X_vec = np.matrix(X_vec)
    labels = np.matrix(Y_labels)
    weight_vec = np.matrix(weight_vec)

    lhs = np.multiply(-1*labels, np.log(hypothesis(X_vec, weight_vec)))
    rhs = np.multiply((1-labels), np.log(1 - hypothesis(X_vec, weight_vec)))  
    return np.sum(lhs - rhs)/labels.shape[0]

def gradient(X_vec, weight_vec, labels):
    X_vec = np.matrix(X_vec)
    labels = np.matrix(Y_labels)
    weight_vec = np.matrix(weight_vec)

    grad_result = np.zeros(weight_vec.shape[1])
    error  = hypothesis(X_vec, weight_vec) - labels
    for i in range(weight_vec.shape[1]):        
    grad_result[i] = np.sum(np.multiply(error, X_vec[:,i])) / len(X_vec)
    return grad_result

# optimize cost function
# asssume all train-set/labels and initial parameters are loaded correctly
result = opt.fmin_tnc(func=cost, x0=theta_weights, fprime=gradient, args=(X_examples, Y_labels))

output

result[0] ===> [0, 0, 0] ===> No optimization/learning happened.

I think I've got something wrong but its not obvious to me. Any right insight would be helpful.


Solution

  • Your gradient is broken! (somehow: math vs. np.shapes?)

    I'm not gonna analyze your gradient (looks like logistic regression; read this then), but here a demo which supports my claim:

    scipy.optimize.check_grad -> failing!

    from scipy.optimize import check_grad
    from sklearn.datasets import make_classification
    X_examples, Y_labels = make_classification()
    
    ...
    
    print(check_grad(cost, gradient, np.random.random(size=20), X_examples, Y_labels))
    # 61.666912359 (example value; indeterministic because of PRNG-usage above)
    # result: The square root of the sum of squares (i.e. the 2-norm) of the difference between
    # grad(x0, *args) and the finite difference approximation of grad using func
    # at the points x0.
    

    scipy.optimize.fmin_tnc(...fprime=None, approx_grad=True...) -> works!

    Using numerical-differentiation automatically:

    result = opt.fmin_tnc(func=cost, x0=x0, fprime=None, approx_grad=True, args=
    (X_examples, Y_labels))
    

    Output:

    NIT   NF   F                       GTG
      0    1  6.931471805599453E+01   3.69939729E+03
    tnc: fscale = 0.0164412
      1   14  5.488792098836968E+01   1.91876351E+03
      2   26  4.598767699927674E+01   5.90835511E+02
      3   39  4.255784649333560E+01   3.42105829E+02
      4   44  3.153577598035866E+01   3.09160832E+02
      5   50  2.224511577357391E+01   4.36685983E+01
      6   54  2.157944424362721E+01   3.39632081E+01
      7   59  2.136340974081865E+01   2.97596794E+01
      8   73  1.997400905570375E+01   1.08022452E+01
      9   75  1.984787529493228E+01   1.05379689E+01
     10   79  1.979578396181381E+01   1.16542972E+01
     11   88  1.939906531954665E+01   9.19521103E+00
    tnc: fscale = 0.329776
     12   91  1.867469105176042E+01   4.61533306E+00
     13   94  1.834698220306902E+01   1.37837652E+00
     14   98  1.818150860822102E+01   1.39090344E+00
     15  102  1.817553527302105E+01   1.22472879E+00
     16  118  1.810790027768505E+01   6.81994565E-01
     17  134  1.807103645037105E+01   1.11197844E+00
     18  148  1.805232746344993E+01   1.31428979E+00
     19  154  1.804213064407819E+01   7.10363935E-01
     20  168  1.803844265918305E+01   5.98840568E-01
     21  170  1.803426849431969E+01   7.44231905E-01
     22  175  1.803266497733540E+01   8.68201237E-01
     23  189  1.799550032713761E+01   1.13982866E+00
     24  191  1.799048283812780E+01   7.41273919E-01
     24  200  1.799048283812780E+01   7.41273919E-01
    tnc: Maximum number of function evaluations reached
    

    Additional remark: np.matrix is not recommended (use np.array)!