I have a data frame in pandas with a DateTime index.
When grouping it with a time grouper: pd.Grouper(freq='360Min')
, how can I join this result back onto the original timestamp?
I.e. an equijoin timestamp=bucket will not work?
Is there a convenience function?
Should the asof
join be used?
Or would I manually have to extract the hours and then try to match it up?
for a source of
import pandas as pd
df = pd.DataFrame(
{
"Publish date": [
pd.Timestamp("2000-01-02"),
pd.Timestamp("2000-01-02"),
pd.Timestamp("2000-01-09"),
pd.Timestamp("2000-01-16")
],
"ID": [0, 1, 2, 3],
"Price": [10, 20, 30, 40]
}
)
Which gives:
Publish date ID Price
0 2000-01-02 0 10
1 2000-01-02 1 20
2 2000-01-09 2 30
3 2000-01-16 3 40
I want to perform an aggregation with an arbitrary frequency (not only month, day, hour), let's say 1
month.
agg_result = df.groupby(pd.Grouper(key="Publish date", freq="1M")).agg([pd.Series.mean, pd.Series.median]).reset_index()
agg_result.columns = ['_'.join(col).strip() for col in agg_result.columns.values]
agg_result.columns = ['Publish date month', 'ID_mean', 'ID_median', 'Price_mean', 'Price_median']
print(agg_result)
Publish date month ID_mean ID_median Price_mean Price_median
0 2000-01-31 1.5 1.5 25 25
How can I ensure that the equijoin would work again? I.e. transform the original timestamp into the fitting bucket using the same arbitrary frequency?
I.e. described in the code of the example, how can I get:
agg_result['Publish date month'] = agg_result['Publish date'].apply(magic transform to same frequency bucket)
df.merge(agg_result, on['Publish date month'])
To work, i.e. define the transformation to the right bucket?
EDIT:
The easiest way to identify the corresponding original values for each group should be:
gb = df.groupby(pd.Grouper(key="Publish date", freq="1M"))
dict(list(gb['Publish date']))
You can then use this to join any information back to the original table.
Can you just join on two intermediate columns?
df['Publish date'].dt.month
and
df.groupby(pd.Grouper(key="Publish date", freq="1M")).agg([pd.Series.mean, pd.Series.median]).index.month
like this
results = df.groupby(pd.Grouper(key="Publish date", freq="1M")).agg([pd.Series.mean, pd.Series.median])
results.columns = ['-'.join(col[::-1]).strip() for col in results.columns]
df['month'] = df['Publish date'].dt.month
results['month'] = results.index.month
results.merge(df)