Search code examples
pythonarrayssubsampling

Subsampling a 1D array of integer so that the sum hits a target value in python


I have two 1D arrays of integers whose some differ, for example:

a = [1,2,2,0,3,5]
b = [0,0,3,2,0,0]

I would like the sum of each array to be equal to that of the smallest of the two. However I want to keep values as integers, not floats, so dividing is not an option. The solution appears to be some subsampling of the biggest array so that its sum is equal to that of the smallest one:

target = [min(sum(a), sum(b))]

However, I cannot find a function that would perform such subsampling. The only one I found are in scipy but they seem dedicated to treat audio signal. The alternative was a function of the scikit-bio package but it does not work on Python 3.7.


Solution

  • You could convert the array to indices, sample the indices and convert back to values as follows:

    import numpy as np
    np.random.seed(0)
    a = np.array([1,2,2,0,3,5])
    
    # Generate an array of indices, values in "a"
    # define the number of occurences of their index
    a_idx = np.array([i for i in range(len(a))])
    a_idx = np.repeat(np.arange(len(a)), a)
    # [0, 1, 1, 2, 2, 4, 4, 4, 5, 5, 5, 5, 5]
    
    # Randomly shuffle indices and pick the n-first
    a_sub_idx = np.random.permutation(a_idx)[:target]
    # [4, 1, 2, 2, 5]
    
    # Count the number of occurences of each index
    a_sub_idx, a_sub_vals = np.unique(a_sub_idx, return_counts=True)
    # Generate a new array of values the sampled indices
    a_sub = np.zeros(a.shape)
    a_sub[a_sub_idx] = a_sub_vals
    # [0., 1., 2., 0., 1., 1.]