Search code examples
pythonbinning

Python - How to generate a binning index for a list?


I have a 10 bins:

    bins = [0,1,2,3,4,5,6,7,8,9]

I have a list of 25 values:

    values = [10,0,0,14,14,123,235,0,0,0,0,0,12,12,1235,23,234,15,15,23,136,34,34,37,45]

I want to bin the values sequentially into the bins so each value is grouped into its bin:

binnedValues = [[10,0],[0,14,14],[123,235],[0,0,0],[0,0],[12,12,1235],[23,234],[15,15,23],[136,34,34],[37,45]]

As you can see, the number of values in the bin is not always the same, (as len(values) != len(bins))

Also, I have lots of different values lists that are all different sizes. So I need to do this a number of times for the same number of bins, but different lengths of values lists. The above is an example - the real bin size is 10k, and the real len(values) is from ~10k to ~750k..

Is there a way to do this consistently? I need to maintain the order of the values, but split the values list evenly so there is a 'fair' and 'even' number of the values range distributed to each of the bins.

I think I can use numpy.digitize, but having had a look, I can't see how to generate the 'binned' list


Solution

  • Are you trying to split the list into lists of alternating size between 2 and 3 elements? That's doable, then.

    from itertools import cycle
    
    values = [10,0,0,14,14,123,235,0,0,0,0,0,12,12,1235,23,234,15,15,23,136,34,34,37,45]
    splits = cycle([2,3])
    bins = []
    count = 0
    
    while count < len(values):
        splitby = splits.next()
        bins.append(values[count:count+splitby])
        count += splitby
    
    print bins
    

    Edit:

    Ah, I see what you're requesting... sort of. Something more like:

    from itertools import cycle from math import floor, ceil

    values = [10,0,0,14,14,123,235,0,0,0,0,0,12,12,1235,23,234,15,15,23,136,34,34,37,45]
    number_bins = 10
    bins_lower = int(floor(len(values) / float(number_bins)))
    bins_upper = int(ceil(len(values) / float(number_bins)))
    
    splits = cycle([bins_lower, bins_upper])
    bins = []
    count = 0
    
    while count < len(values):
        splitby = splits.next()
        bins.append(values[count:count+splitby])
        count += splitby
    
    print bins
    

    If you want to more variety in bin size, you can add more numbers to splits

    Edit 2:

    Ashwin's way, which is more concise without being harder to understand.

    from itertools import cycle, islice
    from math import floor, ceil
    
    values = [10,0,0,14,14,123,235,0,0,0,0,0,12,12,1235,23,234,15,15,23,136,34,34,37,45]
    number_bins = 10
    bins_lower = int(floor(len(values) / float(number_bins)))
    bins_upper = int(ceil(len(values) / float(number_bins)))
    
    splits = cycle([bins_lower, bins_upper])
    
    it = iter(values)
    bins = [list(islice(it,next(splits))) for _ in range(10)] 
    print bins