Search code examples
pythonmathprobabilitynewtons-method

Python non linear equation with lagrangian multipliers estimation


i 've been struggling for days on this thing....but to no avail. I'm not very good at difficult math let alone this kind of level of difficulty.

I was trying to implement the maximum entropy application for the lottery in python for my graduation assignment, altough the focus of the project was to implement a number of data mining techinques (Decision trees,Apriori,kmeans) something alreadey finished, i just could not pass the opportunity to do something more advanced....but i guess this is too advanced for me.

So, my question is how can i solve the non linear equation (8) from the following paper

reference1: http://eprints.ecs.soton.ac.uk/901/01/paper05.pdf

the method is based in the following paper

reference2: http://www.stanford.edu/~cover/papers/paper91.pdf

any help (theoritical or othwerwise) will be deeply appreciated. thanks


Solution

  • You need to use equations 7 through 9 in combination. The only things that are unknown in the equations are the Lagrange multipliers, the lambdas. Everything else depends on the empirical data available, and are thus just numbers.

    Given a set of values for the lambdas, you can calculate the G(j,r) and the Jacobian J(j,i,r,s). In turn, if you know the residuals and the Jacobian, you can use Newton's method, given in equation 9, to find the roots of the system of equations, i.e., those values of lambda such that G(j,r) = 0.

    Thus, you use an initial guess at the values for the lambdas to calculate the other terms, then use those terms to update your guess. There's no conceptual challenge to working with equation 7 and 8 at all -- just plug in the values -- but they are adding up a lot of numbers, so some care is warranted.

    Equation 9 is a little tricky, as it's not written very clearly. Since the paper describes a system of equations, you'd generally expect to solve a linear equation:

    J * d_lambda = -G
    

    where d_lambda is a vector of changes in the guess, G is a vector of values for the function, and J is a matrix of Jacobian values. The notation in the paper is pretty muddled, obscuring what should be a simple expression. You can get it into a clearer form by introducing a unified index a to replace the pair of indices i and s; the authors mention just this change in the discussion of the method, giving a formula for calculating the combined index in the second paragraph on page 4.

    Overall, the procedure becomes (using the unified index):

    1. Choose some lambdas to act as your initial guess. Maybe zeros, or random numbers.
    2. Evaluate G(a) and J(a,b).
    3. Solve a system of linear equations to get the updates to your guess.
    4. If the updates are small compared to your guess, stop. Otherwise, determine the new guess and go back to step 2.

    This looks quite feasible using Numpy. The paper talked about using a parallel computing strategy, but that was over ten years ago; it seems like a much smaller problem today.