I want to resample data
column using forward fill ffill
at the frequency of 1min
while grouping df
by id
column:
df
:
id timestamp data
1 1 2017-01-02 13:14:53.040 10.0
2 1 2017-01-02 16:04:43.240 11.0
...
4 2 2017-01-02 15:22:06.540 1.0
5 2 2017-01-03 13:55:34.240 2.0
...
Expected output:
id timestamp data
1 1 2017-01-02 13:14:53.040 10.0
2017-01-02 13:14:54.040 10.0
2017-01-02 13:14:55.040 10.0
2017-01-02 13:14:56.040 10.0
...
2 1 2017-01-02 16:04:43.240 11.0
2017-01-02 16:04:44.240 11.0
2017-01-02 16:04:45.240 11.0
2017-01-02 16:04:46.240 11.0
...
4 2 2017-01-02 15:22:06.540 1.0
2017-01-02 15:22:07.540 1.0
2017-01-02 15:22:08.540 1.0
2017-01-02 15:22:09.540 1.0
...
5 2 2017-01-03 13:55:34.240 2.0
2017-01-03 13:55:35.240 2.0
2017-01-03 13:55:36.240 2.0
2017-01-03 13:55:37.240 2.0
...
Something like this post but I tried:
df.set_index('timestamp').groupby('id').resample('1min').asfreq().drop(['id'], 1).reset_index()
and data
column returned only NaN
values:
id timestamp data
0 1 2017-01-02 13:14:53.040 NaN
1 1 2017-01-02 13:14:54.040 NaN
2 1 2017-01-02 13:14:55.040 NaN
3 1 2017-01-02 13:14:56.040 NaN
4 1 2017-01-02 13:14:57.040 NaN
... ... ... ...
Edit:
df
timestamp
changed from 2017-01-02 12:04:43.240
to 2017-01-02 16:04:43.240
, ie., rows belonging to the same id
should be sorted.Use custom function with define how many need new rows by Timedelta
with date_range
and DataFrame.reindex
:
def f(x):
new = x.index[0] + pd.Timedelta(5, unit='Min')
r = pd.date_range(x.index[0], new, freq='Min')
return x.reindex(r, method='ffill')
df = (df.reset_index()
.set_index('timestamp')
.groupby(['index','id'], sort=False)['data']
.apply(f)
.reset_index(level=0, drop=True)
.rename_axis(['id','timestamp'])
.reset_index()
)
print (df)
id timestamp data
0 1 2017-01-02 13:14:53.040 10.0
1 1 2017-01-02 13:15:53.040 10.0
2 1 2017-01-02 13:16:53.040 10.0
3 1 2017-01-02 13:17:53.040 10.0
4 1 2017-01-02 13:18:53.040 10.0
5 1 2017-01-02 13:19:53.040 10.0
6 1 2017-01-02 12:04:43.240 11.0
7 1 2017-01-02 12:05:43.240 11.0
8 1 2017-01-02 12:06:43.240 11.0
9 1 2017-01-02 12:07:43.240 11.0
10 1 2017-01-02 12:08:43.240 11.0
11 1 2017-01-02 12:09:43.240 11.0
12 2 2017-01-02 15:22:06.540 1.0
13 2 2017-01-02 15:23:06.540 1.0
14 2 2017-01-02 15:24:06.540 1.0
15 2 2017-01-02 15:25:06.540 1.0
16 2 2017-01-02 15:26:06.540 1.0
17 2 2017-01-02 15:27:06.540 1.0
18 2 2017-01-03 13:55:34.240 2.0
19 2 2017-01-03 13:56:34.240 2.0
20 2 2017-01-03 13:57:34.240 2.0
21 2 2017-01-03 13:58:34.240 2.0
22 2 2017-01-03 13:59:34.240 2.0
23 2 2017-01-03 14:00:34.240 2.0
because if use ffill
output is different:
df = df.set_index('timestamp').groupby('id', sort=False)['data'].resample('1min').ffill()
print (df)
id timestamp
1 2017-01-02 12:04:00 NaN
2017-01-02 12:05:00 11.0
2017-01-02 12:06:00 11.0
2017-01-02 12:07:00 11.0
2017-01-02 12:08:00 11.0
2 2017-01-03 13:51:00 1.0
2017-01-03 13:52:00 1.0
2017-01-03 13:53:00 1.0
2017-01-03 13:54:00 1.0
2017-01-03 13:55:00 1.0
Name: data, Length: 1425, dtype: float64