I have dataframe df
as below:
start_time end_time count
0 2020-02-03 08:42:21.997 2020-02-03 09:34:18.737 3116
1 2020-02-03 09:34:18.837 2020-02-03 10:16:56.583 2557
2 2020-02-03 10:17:00.480 2020-02-03 13:18:51.540 10911
3 2020-02-03 13:18:51.640 2020-02-03 14:01:23.263 2551
4 2020-02-03 14:01:23.363 2020-02-03 14:43:56.977 255
I would like to group by the date
only of the start_time
column and sum all corresponding count
values in the same day. I found a relevant answer from this post.
Using this method:
data.groupby(data.date.dt.year)
however, I received the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-46-7618d5285bb9> in <module>()
1
----> 2 df.groupby(df.date.dt.year) # Adding ['start_time'] will return 'AttributeError: 'Series' object has no attribute 'date''.
3
4
5
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
5177 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5178 return self[name]
-> 5179 return object.__getattribute__(self, name)
5180
5181 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'date'
What is the problem and how can I group these non-unique datetime values in the start_time
column by date only and sum the values?
Edit:
In fact, I was able to do it with
import datetime
df['date'] = df['start_time'].dt.date # Group by 'date' of 'datetime' column
df.groupby('date').sum() # Sum
But I'd like to know if I could do it directly, probably something more straightforward like a one-liner as shown in the answer in the aforementioned post.
Super close, datetime.dt.date
is how you access just the date potion of the datetime object (https://www.geeksforgeeks.org/python-pandas-series-dt-date/). Try:
data.groupby(data["start_time"].dt.date)["count"].sum()
Here is some background information about the indexing that I think you're missing:
When we write data["start_time"]
, we are getting column start_time
from your dataframe data
. An equivalent way of getting this column is to use data.start_time
. When you try to access data.date
(which is equivalent to data["date"]
), we get an attribute error because your dataframe data
does not have a column called date
.
If the start_time
column is of type datettime
then it has an attribute called dt
which has the attribute date
which is what we are wanting to group by. We can access this through data.start_time.dt.date
or data["start_time"].dt.date
.
When you write data["date"] = data["start_time"]
, you are creating a new column in your dataframe called date
which is equal to your start_time
column. You can now access it through data.date
(or data["date"]
) which is why your solution works.