I have a dataset of events with a date
column which I need to display in a weekly plot and do some more data processing afterwards. After some googling I found pd.Grouper(freq="W")
so I am using that to group the events by week and display them. My problem is that after doing the groupby
and ungroup
I end up with a data frame where there is an unnamed column that I am unable to refer to except using iloc
. This is an issue because in later plots I am grouping by other columns so I need a way to refer to this column by name, not iloc.
Here's a reproducible example of my dataset:
from datetime import datetime
from faker import Faker
fake = Faker()
start_date = datetime(2023, 1, 1)
end_date = datetime(2023, 2, 1)
# Generate data frame of 30 random dates in January 2023
df = pd.DataFrame(
{"date": [fake.date_time_between(start_date=start_date, end_date=end_date) for i in range(30)],
"dummy": [1 for i in range(30)]}) # There's probably a better way of counting than this
grouper = df.set_index("date").groupby([pd.Grouper(freq="W"), 'dummy'])
result = grouper['dummy'].count().unstack('dummy').fillna(0)
The result
data frame that I get has weird indexes/columns that I am unable to navigate:
>>> print(result)
dummy 1
date
2023-01-01 1
2023-01-08 3
2023-01-15 4
2023-01-22 9
2023-01-29 8
2023-02-05 5
>>> print(result.columns)
Int64Index([1], dtype='int64', name='dummy')
Then only column here is dummy
, but even after result.dummy
I get an AttributeError
I've also tried result.reset_index()
:
dummy date 1
0 2023-01-01 1
1 2023-01-08 3
2 2023-01-15 4
3 2023-01-22 9
4 2023-01-29 8
5 2023-02-05 5
But for this data frame I can only get the date
column - the counts column named "1" cannot be accessed using result.reset_index()["1"]
as I get an AttributeError
I am completely perplexed by what is going on here, pandas is really powerful but sometimes I find it incredibly unintuitive. I've checked several pages of the docs and checked if there's another index level (there isn't). Can someone who's better at pandas help me out here?
I just want a way to convert the grouped data frame into something like this:
date counts
0 2023-01-01 1
1 2023-01-08 3
2 2023-01-15 4
3 2023-01-22 9
4 2023-01-29 8
5 2023-02-05 5
Where date
and counts
are columns and there is an unnamed index
You can solve this by simply doing:
from datetime import datetime
from faker import Faker
fake = Faker()
start_date = datetime(2023, 1, 1)
end_date = datetime(2023, 2, 1)
# Generate data frame of 30 random dates in January 2023
df = pd.DataFrame(
{"date": [fake.date_time_between(start_date=start_date, end_date=end_date) for i in range(30)],
"dummy": [1 for i in range(30)]}) # There's probably a better way of counting than this
result = df.groupby([pd.Grouper(freq="W", key='date'), 'dummy'], squeeze=True)['dummy'].count()
result = result.reset_index(name='counts')
result = result.drop(['dummy'], axis = 1)
which gives
date counts
0 2023-01-01 3
1 2023-01-08 7
2 2023-01-15 5
3 2023-01-22 5
4 2023-01-29 8
5 2023-02-05 2