Search code examples
pythonvarianceminimization

Minimize variance python


Not sure how to proceed with this. I have a list of numbers (a list of lists of numbers to be exact), but these number have an ambiguity: x, x+1 and x-1 are exactly the same thing for me. However, I'd like to minimize the variance of the list by changing the elements. Here's what i thought so far (with a sample list that I know it doesn't work):

import numpy as np
from scipy import stats

lst = [0.474, 0.122, 0.0867, 0.896, 0.979]
def min_var(lst):
    mode = np.mean(lst)
    var = np.var(lst)
    result = []
    for item in list(lst):
        if item < mean: # not sure this is a good test
            new_item = item + 1
        elif item > mean:
            new_item = item - 1
        else:
            new_item = item
        new_list = [new_item if x==item else x for x in lst]
        new_var = np.var(new_list)
        if new_var < var:
            var = new_var
            lst = new_list
    return lst

What the function does is add 1 to the 3rd element. However, the minimum variance occurs when you subtract 1 from the 4th and 5th. This happens because I'm minimizing the variance after each item, not allowing for multiple changes. How could I implement multiple changes, preferably without looking at all possible solutions (3**n if I'm not mistaken)? Thanks a lot


Solution

  • You can consider this as a problem of finding the delta that minimizes var((x + delta) % 1) where x your array of values. Then you add and subtract integers from your values until they lie in the range delta - 1 <= x[i] < delta. This isn't a continuous function of delta, so you can't use solvers like in scipy.optimize. But we can use the information that the value of var((x + delta) % 1) only changes at each value of x, which means we only need to test each value in x as a possible delta, and find the one that minimizes the variance.

    import numpy as np
    
    x = np.array([0.474, 0.122, 0.0867, 0.896, 0.979])
    
    # find the value of delta
    delta = x[0]
    min_var = np.var((x - delta) % 1)
    for val in x:
        current_var = np.var((x - val) % 1)
        if current_var < min_var:
            min_var = current_var
            delta = val
    
    print(delta)
    
    # use `delta` to subtract and add the right integer from each value
    # we want values in the range delta - 1 <= val < delta
    for i, val in enumerate(x):
        while val >= delta:
            val -= 1.
        while val < delta - 1.:
            val += 1.
        x[i] = val
    
    print(x)
    

    For this example, it finds your desired solution of [ 0.474 0.122 0.0867 -0.104 -0.021 ] with a variance of 0.0392.