For each value, find the smallest distance within the group using Pandas

suppose I have the following data frame df

group value
a       1
a       0
a       3
b       1
b       4

For each value, I want to find the closest value in the same group ('a' or 'b')，and then output the difference between the value and the closest value to it.

The output should look like this:

group value min_diff
a       1    1
a       0    1
a       3    2
b       1    3
b       4    3

Thank you!

Solution

You can use a self-merge_asof:

tmp = df.sort_values('value')

out = (pd.merge_asof(
           tmp.reset_index(),
           tmp.rename(columns={'value': 'value_'}),
           by='group',
           left_on='value', right_on='value_',
           direction='nearest',
           allow_exact_matches=False)
         .assign(min_diff=lambda d: d['value'].sub(d.pop('value_')).abs())
         .set_index('index').reindex(df.index)
       )

If you can have duplicated values within a group (see example below) and would like to report 0 in this case, you further need to process the output:

out.loc[df.duplicated(subset=['group', 'value'], keep=False), 'min_diff'] = 0

Output:

  group  value  min_diff
0     a      1         1
1     a      0         1
2     a      3         2
3     b      1         3
4     b      4         3

Intermediate after the merge_asof:

   index group  value  value_
0      1     a      0       1
1      0     a      1       0
2      3     b      1       4
3      2     a      3       1
4      4     b      4       1

handling duplicated values

Because of allow_exact_matches=False, matching the same value is prevented.

Without correction:

  group  value  min_diff
0     a      1         1
1     a      0         1
2     a      3         2
3     b      1         3
4     b      4         3
5     c      8         1
6     c      9         1  # matched with 8
7     c      9         1  #

After correction with out.loc[df.duplicated(subset=['group', 'value'], keep=False), 'min_diff'] = 0:

  group  value  min_diff
0     a      1         1
1     a      0         1
2     a      3         2
3     b      1         3
4     b      4         3
5     c      8         1
6     c      9         0  # matched with each other
7     c      9         0  #