Search code examples
pandasgroupingbinning

Pandas value_counts with a little tolerance


I have a set of data and want to get its value_counts result:

df = pd.DataFrame(
    [5.01, 5.01, 5.08, 6.1, 5.54, 6.3, 5.56, 5.55, 6.7],
    columns=['val'])
>>> df
    val
0  5.01
1  5.01
2  5.08
3  6.10
4  5.54
5  6.30
6  5.56
7  5.55
8  6.70
>>> df.val.value_counts()
5.01    2
5.08    1
6.10    1
5.54    1
6.30    1
5.56    1
5.55    1
6.70    1
Name: val, dtype: int64

Is there a way to allow a certain tolerance when using value_counts, such as plus or minus 0.01, so that 5.54, 5.55, and 5.56 in the series are calculated as a group? The result I hope is:

[5.54,5.56,5.55] 3
[5.01] 2
[5.08] 1
[6.10] 1
...

Solution

  • try this:

    mask = df['val'].sort_values().diff().gt(0.01)
    result_df = df.groupby(mask.cumsum())['val'].agg([set, 'count'])
    print(result_df)
    
                        set  count
    val                           
    0                {5.01}      2
    1                {5.08}      1
    2    {5.55, 5.54, 5.56}      3
    3                 {6.1}      1
    4                 {6.3}      1
    5                 {6.7}      1