Search code examples
statisticsvariancemaximization

Maximizing variance of bounded variables


Let x_1, ..., x_i,..., x_p be p real numbers such that 0 <= x_i <= b for all i. That is, each x_i can take any value between 0 and b. I'd like to find the values of {x_i}'s that maximize the variance among them.

Do you have any hints? I'd like to use this result for my example code. Or, isn't this question well-defined?

At first I thought of something like x_1=0, x_2=...=x_p=b, but then I found this does not maximize the variance when p is a little bit large.

Thanks


Solution

  • After comments, I did some trials on a numerical prove for your problem. There's still some work to do, but I hope it puts you in the right track. Besides, I've used python, and I have no idea if this is ok for you or not. You can surely find equivalent ways to do it in matlab and R.

    I use the well-known property of the variance = E[X^2] - E[X]^2, to make the derivatives easier. (If you have doubts, check wiki).

    The python package scipy.optimize has a method minimize to minimize numerically a function. You can select an algorithm for solving the problem; I'm not so familiar with the possible algorithms and I was looking for the well-known plain gradient descent (well, at least I hope you know it), and I think a closed one could be SLSQP, but honestly I'm not 100% sure on the details.

    And finally, I didn't make sure that the function you're minimizing is convex, or figured out whether it has local minima, but the results look fine.

    I give you the code in python below, in case it is useful, but the bottomline is that I'd suggest you:

    • Choose a language/package you are familiar with
    • Choose an algorithm for the optimization
    • It would be nice to prove that the function is convex (so that the solution converges)
    • Set the parameters for which you want to make the prove

    Code below. Hope it helps.

    I'm not going to post the algebra for the derivatives, I hope you can make them yourself. And you must take into account that you are maximizing and not minimizing, so you have to multiply by -1, as explained I hope quite clearly here (look for "maximizing").

    Setup,

    In [1]:
    
    from scipy.optimize import minimize
    import numpy as np
    

    The function you are maximizing, that is the variance (remember the the trick E[X^2] - E[X]^2, and the -1),

    In [86]:
    
    def func(x):
        return (-1) * (np.mean([xi**2 for xi in x]) - np.mean(x)**2)
    

    The derivative of that function for each of the xi of the vector x, (I hope you can derivate and get to the same result),

    In [87]:
    
    def func_deriv(x):
        n = len(x)
        l = []
        for i in range(n):
            res = (2 * x[i] / n) - ((2/(n**2)) * (x[i] + sum([x[j] for j in range(n) if j != i])))
            l +=  [(-1) * res]
        return np.array(l)
    

    Actually, I made quite a few mistakes when writing this function, both in the derivative and the python implementation. But there is a trick that helps a lot, which is to check the derivative in a numeric way, by adding and subtracting a small epsilon in every dimension and calculating the slope of the curve see wiki. This would the function that approximates the derivative,

    In [72]:
    
    def func_deriv_approx(x, epsilon=0.00001):
        l = []
        for i in range(len(x)):
            x_plus = [x[j]+((j == i)*epsilon) for j in range(len(x))]
            x_minus = [x[j]-((j == i)*epsilon) for j in range(len(x))]
            res = (-1) * (func(x_plus) - func(x_minus)) / (2*epsilon)
            l += [res]
        return l
    

    And then I've checked func_deriv_approxversus func_deriv for a bunch of values.

    And the minimizing itself. If I initialize the values to the solution we suspect is right, it works ok, it only iterates once and gives the expected results,

    In [99]:
    
    res = minimize(func, [0, 0, 10, 10], jac=func_deriv, bounds=[(0,10) for i in range(4)],
                   method='SLSQP', options={'disp': True})
    Optimization terminated successfully.    (Exit mode 0)
                Current function value: -25.0
                Iterations: 1
                Function evaluations: 1
                Gradient evaluations: 1
    
    In [100]:
    
    print(res.x)
    [  0.   0.  10.  10.]
    

    (Note that you could use the length you wanted, since func and func_deriv are written in a way that accept any length).

    You could initialize randomly like this,

    In [81]:
    
    import random
    xinit = [random.randint(0, 10) for i in range(4)]
    
    In [82]:
    
    xinit
    Out[82]:
    [1, 2, 8, 7]
    

    And then the maximization is,

    In [83]:
    
    res = minimize(func, xinit, jac=func_deriv, bounds=[(0,10) for i in range(4)],
                   method='SLSQP', options={'disp': True})
    Optimization terminated successfully.    (Exit mode 0)
                Current function value: -25.0
                Iterations: 3
                Function evaluations: 3
                Gradient evaluations: 3
    In [84]:
    
    print(res.x)
    [  1.27087156e-13   1.13797860e-13   1.00000000e+01   1.00000000e+01]
    

    Or finally for length = 100,

    In [85]:
    
    import random
    xinit = [random.randint(0, 10) for i in range(100)]
    
    In [91]:
    
    res = minimize(func, xinit, jac=func_deriv, bounds=[(0,10) for i in range(100)],
                   method='SLSQP', options={'disp': True})
    Optimization terminated successfully.    (Exit mode 0)
                Current function value: -24.91
                Iterations: 23
                Function evaluations: 22
                Gradient evaluations: 22
    In [92]:
    
    print(res.x)
    [  2.49143492e-16   1.00000000e+01   1.00000000e+01  -2.22962789e-16
      -3.67692105e-17   1.00000000e+01  -8.83129256e-17   1.00000000e+01
       7.41356521e-17   3.45804774e-17  -8.88402036e-17   1.31576404e-16
       1.00000000e+01   1.00000000e+01   1.00000000e+01   1.00000000e+01
      -3.81854094e-17   1.00000000e+01   1.25586928e-16   1.09703896e-16
      -5.13701064e-17   9.47426071e-17   1.00000000e+01   1.00000000e+01
       2.06912944e-17   1.00000000e+01   1.00000000e+01   1.00000000e+01
      -5.95921560e-17   1.00000000e+01   1.94905365e-16   1.00000000e+01
      -1.17250430e-16   1.32482359e-16   4.42735651e-17   1.00000000e+01
      -2.07352528e-18   6.31602823e-17  -1.20809001e-17   1.00000000e+01
       8.82956806e-17   1.00000000e+01   1.00000000e+01   1.00000000e+01
       1.00000000e+01   1.00000000e+01   3.29717355e-16   1.00000000e+01
       1.00000000e+01   1.00000000e+01   1.00000000e+01   1.00000000e+01
       1.43180544e-16   1.00000000e+01   1.00000000e+01   1.00000000e+01
       1.00000000e+01   1.00000000e+01   2.31039883e-17   1.06524134e-16
       1.00000000e+01   1.00000000e+01   1.00000000e+01   1.00000000e+01
       1.77002357e-16   1.52683194e-16   7.31516095e-17   1.00000000e+01
       1.00000000e+01   3.07596508e-17   1.17683979e-16  -6.31665821e-17
       1.00000000e+01   2.04530928e-16   1.00276075e-16  -1.20572493e-17
      -3.84144993e-17   6.74420338e-17   1.00000000e+01   1.00000000e+01
      -9.66066818e-17   1.00000000e+01   7.47080743e-17   4.82924982e-17
       1.00000000e+01  -9.42773478e-17   1.00000000e+01   1.00000000e+01
       1.00000000e+01   1.00000000e+01   1.00000000e+01   5.01810185e-17
      -1.75162038e-17   1.00000000e+01   6.00111991e-17   1.00000000e+01
       1.00000000e+01   7.62548028e-17  -6.90706135e-17   1.00000000e+01]