I currently have these 2 dataframes(before/after)
Before:
timestamp cnt t1 t2 hum wind_speed is_holiday weather year month day
0 2015-01-04 00:00:00 182 3.0 2.0 93.0 6.0 0 broken clouds 2015 1 4
1 2015-01-04 01:00:00 138 3.0 2.5 93.0 5.0 0 clear 2015 1 4
2 2015-01-04 02:00:00 134 2.5 2.5 96.5 0.0 0 clear 2015 1 4
3 2015-01-04 03:00:00 72 2.0 2.0 100.0 0.0 0 clear 2015 1 4
4 2015-01-04 04:00:00 47 2.0 0.0 93.0 6.5 0 clear 2015 1 4
... ... ... ... ... ... ... ... ... ... ... ...
17337 2016-12-31 19:00:00 493 9.0 6.0 82.0 17.0 0 cloudy 2016 12 31
17338 2016-12-31 20:00:00 465 8.5 6.5 84.5 13.0 0 cloudy 2016 12 31
17339 2016-12-31 21:00:00 474 8.0 6.0 87.0 12.0 0 cloudy 2016 12 31
17340 2016-12-31 22:00:00 587 8.0 6.0 90.0 12.0 0 cloudy 2016 12 31
17341 2016-12-31 23:00:00 697 8.0 5.0 87.0 18.5 0 cloudy 2016 12 31
After:
timestamp cnt
0 2015-01-04 9234
1 2015-01-05 20372
2 2015-01-06 20613
3 2015-01-07 21064
4 2015-01-08 15601
... ... ...
722 2016-12-27 10842
723 2016-12-28 12428
724 2016-12-29 14052
725 2016-12-30 11566
726 2016-12-31 11424
So the point is that i deleted all necessary columns and summed up cnt values( that was made to delete hours from dates and merge them into single index containing all cnt's)
But there is also a column called 'is_holiday' that has 0 if no holiday during this day or 1 if holiday during this day. Indexes of this column show 1 or 0 each hour if holiday is during THIS day. I want to make my 2nd dataframe have 3rd column 'is_holiday' that has 1 or 0.
How do i remake my 1st dataframe to also have is_holiday in 2nd dataframe to make it easier find out if 'cnt' which means 'bicycles rent this day' is higher during holiday days or not.
My code:
import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt
from datetime import datetime
import numpy as np
path2 = r'C:\Users\bossd\OneDrive\Документы\pipirka.csv'
df2 = pd.read_csv(path2)
df2 = df2.groupby(df2["timestamp"].dt.date)["cnt"].sum().reset_index() # Making 2nd dataframe
x = df2.loc[df2['timestamp'].str.startswith('2015'), 'timestamp']
y = df2.loc[df2['timestamp'].str.startswith('2015'), 'cnt']
Use .groupby
+ .agg
:
df = (
df.groupby(df["timestamp"].dt.date)
.agg({"cnt": "sum", "is_holiday": "first"})
.reset_index()
)
print(df)
Prints:
timestamp cnt is_holiday
0 2015-01-04 573 0