Search code examples
pythonpandasdataframepandas-groupby

Group by Date and Column


I have a DataFrame:

Date_time           | Available
21/10/2020 05:00:01 |   Yes
21/10/2020 12:20:01 |   No
22/10/2020 04:30:01 |   Yes
22/10/2020 03:40:01 |   Yes
22/10/2020 01:50:01 |   No
23/10/2020 02:10:01 |   Yes
23/10/2020 11:30:01 |   Yes

I want to replicate SQL's case state and group by in Python.

SELECT date,
       sum(case when Available = 'Yes' then 1 else 0 end)*100/count(*) as Available_rate 
FROM table
group by date

What I am looking for is:

Date        | Available_rate 
21/10/2020  |   50
22/10/2020  |   66.667
23/10/2020  |   100

I am able to do this in python:

daily_count = df.groupby([df['date'].dt.date]).size().reset_index(name='counts')

I am not sure how to go further.


Solution

  • You can convert "Available" to boolean, and use GroupBy.mean, this will take advantage of the True/1 and False/0 equivalence to give you the rate:

    out = (df['Available']
           .eq('Yes')
           .groupby(df['Date'])
           .mean()
           .mul(100)
           )
    

    Output:

    Date
    21/10/2020     50.000000
    22/10/2020     66.666667
    23/10/2020    100.000000
    Name: Available, dtype: float64
    
    alternative format:
    out = (df['Available']
           .eq('Yes')
           .groupby(df['Date'])
           .mean()
           .mul(100)
           .rename('available rate')
           .reset_index()
           )