Search code examples
pythonpandaslistnumpygrouping

Group numbers into bins based on offset with Python


I have a list like this:

ls = [0, 1, 2, 4, 6, 7]  # it won't have duplicates and it is sorted

Now I want to group this list into bins based on a offset (in this example offset=1) which should return this:

[[0, 1, 2], [4], [6, 7]]
# Note: 
# The offset in [0, 1, 2] isn't 1 for 0 and 2,
# but it is for 1 and 2 and this is what I want

Is there an high-level function in numpy, scipy, pandas, etc. which will provide my desired result?

Note: The returned datastructure doesn't have to be a list, any is welcomed.


Solution

  • Using pure python:

    ls = [0, 1, 2, 4, 6, 7]
    
    def group(l, offset=1):
        out = []
        tmp = []
        prev = l[0]
        for val in l:
            if val-prev > offset:
                out.append(tmp)
                tmp = []
            tmp.append(val)
            prev = val
        out.append(tmp)
        return out
    
    group(ls)
    # [[0, 1, 2], [4], [6, 7]]
    

    With :

    import pandas as pd
    
    offset = 1
    
    s = pd.Series(ls)
    s.groupby(s.diff().gt(offset).cumsum()).agg(list)
    

    output:

    0    [0, 1, 2]
    1          [4]
    2       [6, 7]
    dtype: object
    

    With :

    import numpy as np
    
    offset = 1
    
    a = np.split(ls, np.nonzero(np.diff(ls)>offset)[0]+1)
    # [array([0, 1, 2]), array([4]), array([6, 7])]