I have the following dataframe where the column value
is sorted:
df = pd.DataFrame({'variable': {0: 'Chi', 1: 'San Antonio', 2: 'Dallas', 3: 'PHL', 4: 'Houston', 5: 'NY', 6: 'Phoenix', 7: 'San Diego', 8: 'LA', 9: 'San Jose', 10: 'SF'}, 'value': {0: 191.28, 1: 262.53, 2: 280.21, 3: 283.08, 4: 290.75, 5: 295.72, 6: 305.6, 7: 357.89, 8: 380.07, 9: 452.71, 10: 477.67}})
Output:
variable value
0 Chi 191.28
1 San Antonio 262.53
2 Dallas 280.21
3 PHL 283.08
4 Houston 290.75
5 NY 295.72
6 Phoenix 305.60
7 San Diego 357.89
8 LA 380.07
9 San Jose 452.71
10 SF 477.67
I want to find values where the distance between neighboring values is smaller than 10:
df['value'].diff() < 10
Output:
0 False
1 False
2 False
3 True
4 True
5 True
6 True
7 False
8 False
9 False
10 False
Name: value, dtype: bool
Now I want to equally space those True
values that are too close to each other. The idea is to take the first value before the True
sequence (280.21) and add 5 to each next True
value (cumulative sum): first True
= 280 + 5, second True
= 280 + 5 + 5, third True
= 280 + 5 + 5...
Expected Output:
variable value
0 Chi 191.28
1 San Antonio 262.53
2 Dallas 280.21
3 PHL 285.21 <-
4 Houston 290.21 <-
5 NY 295.21 <-
6 Phoenix 300.21 <-
7 San Diego 357.89
8 LA 380.07
9 San Jose 452.71
10 SF 477.67
My solution:
mask = df['value'].diff() < 10
df.loc[mask, 'value'] = 5
df.loc[mask | mask.shift(-1), 'value'] = last_day[mask | mask.shift(-1), 'value'].cumsum()
Maybe there is a more elegant one.
Let's try this:
df = pd.DataFrame({'variable': {0: 'Chi', 1: 'San Antonio', 2: 'Dallas', 3: 'PHL', 4: 'Houston', 5: 'NY', 6: 'Phoenix', 7: 'San Diego', 8: 'LA', 9: 'San Jose', 10: 'SF'}, 'value': {0: 191.28, 1: 262.53, 2: 280.21, 3: 283.08, 4: 290.75, 5: 295.72, 6: 305.6, 7: 357.89, 8: 380.07, 9: 452.71, 10: 477.67}})
s = df['value'].diff() < 10
add_amt = s.cumsum().mask(~s) * 5
df_out = df.assign(value=df['value'].mask(add_amt.notna()).ffill() + add_amt.fillna(0))
df_out
Output:
variable value
0 Chi 191.28
1 San Antonio 262.53
2 Dallas 280.21
3 PHL 285.21
4 Houston 290.21
5 NY 295.21
6 Phoenix 300.21
7 San Diego 357.89
8 LA 380.07
9 San Jose 452.71
10 SF 477.67