Search code examples
pythonfunctionbinning

Python: define individual binning


I am trying to define my own binning and calculate the mean value of some other columns of my dataframe over these bins. Unfortunately, it only works with integer inputs as you can see below. In this particular case "step_size" defines the step of one bin and I would like to use float values like 0.109 which corresponds to 0.109 seconds. Do you have any idea how I can do this? I think the problem is in the definition of "create_bins" but I cannot fix it... The goal should be to get this: [(0,0.109),(0.109,0,218),(0.218,0.327) ......]

Greets

# =============================================================================
# Define parameters
# =============================================================================
seconds_min = 0 
seconds_max = 9000 
step_size = 1 
bin_number = int((seconds_max-seconds_min)/step_size)


# =============================================================================
# Define function to create your own individual binning

# lower_bound defines the lowest value of the binning interval
# width defines the width of the binning interval
# quantity defines the number of bins
# =============================================================================
def create_bins(lower_bound, width, quantity):
    bins = []
    for low in range(lower_bound, 
                      lower_bound + quantity * width + 1, width):
        bins.append((low, low+width))
    return bins


# =============================================================================
# Create binning list
# =============================================================================
bin_list = create_bins(lower_bound=seconds_min,
                    width=step_size,
                    quantity=bin_number)

print(bin_list)

Solution

  • The problem lies in the fact that the range function does not allow for float ranges.

    You can use the numeric_range function in more_itertools for this:

    from more_itertools import numeric_range
    
    seconds_min = 0
    seconds_max = 9
    step_size = 0.109
    bin_number = int((seconds_max-seconds_min)/step_size)
       
       
    
    def create_bins(lower_bound, width, quantity):
        bins = []
        for low in numeric_range(lower_bound,
                          lower_bound + quantity * width + 1, width):
            bins.append((low, low+width))
        return bins
       
    bin_list = create_bins(lower_bound=seconds_min,
                           width=step_size,
                           quantity=bin_number)
       
        
    print(bin_list)
    # (0.0, 0.109), (0.109, 0.218), (0.218, 0.327) ... ]