I have a dataframe with two integer columns that represent the start and end of a string of text. I'd like to group my rows by length of text (end - start), but with a margin of error of +- 5 characters so that something like this would happen:
start end
0 251
1 250
2 250
0 500
1 500
0 499
How would I achieve something like this? Here is the code I am using right now
d = {'text': ["aaa", "bbb", "ccc", "ddd", "eee", "fff"],
'start': [0, 1, 0, 2, 1, 0],
'end': [250, 500, 501, 251, 249, 499]}
df = pd.DataFrame(data=d)
df = df.groupby(['start', 'end'])
I ended up solving the problem by rounding the length of my text.
df['rounded_length'] = (df['end'] - df['start']).round(-1)
df = df.groupby('rounded_length')
All my values become multiples of 10, and I can group them this way.