Search code examples
pandasdataframepandas-groupby

Pandas Groupby integer with margin of error


I have a dataframe with two integer columns that represent the start and end of a string of text. I'd like to group my rows by length of text (end - start), but with a margin of error of +- 5 characters so that something like this would happen:

 start    end
 0        251
 1        250
 2        250

 0        500
 1        500
 0        499

How would I achieve something like this? Here is the code I am using right now

d = {'text': ["aaa", "bbb", "ccc", "ddd", "eee", "fff"], 
    'start': [0, 1, 0, 2, 1, 0], 
    'end': [250, 500, 501, 251, 249, 499]}

df = pd.DataFrame(data=d)

df = df.groupby(['start', 'end'])

Solution

  • I ended up solving the problem by rounding the length of my text.

    df['rounded_length'] = (df['end'] - df['start']).round(-1)
    
    df = df.groupby('rounded_length')
    

    All my values become multiples of 10, and I can group them this way.