Search code examples
mysqlsqlgroup-bypivotdynamic-pivot

MySQL pivot-like operation to get breakdown of percentage of total events per day per event type


There's a table of events

created_at DATETIME
event_type STRING 
# Some other columns with data about the event

What I would like to do is be able to get the percentage of the total number of events per day per event_type.

So, I grouped the events to get the count per day per event:

# Lazily used date_bucket in GROUP BY since it's valid MySQL.
# Is that bad since it's not standard SQL?
#
SELECT 
    DATE(created_at) as date_bucket, 
    event_type, 
    COUNT(*) as number
FROM 
    example_table 
GROUP BY
    date_bucket, event_type

If we had rows

# Columns (date_bucket, event_type, number)
#
('2020-06-02', 'exampleG1', 5)
('2020-06-02', 'exampleG2', 10)
('2020-06-02', 'exampleG3', 20)
('2020-06-03', 'exampleG1', 10)

I would like to be able to get something equivalent in handling to

# Columns (date_bucket, exampleG1, exampleG2, exampleG3)
#
('2020-06-02', 15/35, 10/35, 20/35)
('2020-06-03', 10/10, 0, 0)

I don't know the distinct event_type values before hand, and not all group values may be present on all days, in which case the value for that type should be 0 on that day.

I was thinking to do some kind of pivot operation, but it appears MySQL doesn't support pivots, so I'm at a bit of a loss how to approach this.

If I knew the set of valid event types ahead of time, I think I could do some nasty verbose query on the possible types, but the set is variable.

Is there an elegant way of achieving this?


Solution

  • I don't know the distinct event_type values before hand

    You are asking for dynamic SQL. That is, dynamically build the query string from another query that lists distinct event_type values, then execute it. In MySQL, this is implemented using prepared statements.

    Here is how to do it:

    select @sql := group_concat(distinct
        'sum(case when event_type = ''', 
        event_type, ''' then number else 0 end)/sum(number) as `ratio_', 
        event_type, '`'
    ) 
    from example_table;
    
    set @sql = concat(
        'select date(created_at) date_bucket, ', 
        @sql, 
        ' from example_table group by date(created_at) order by date_bucket'
    );
    
    -- debug
    select @sql;
    
    -- execute
    prepare stmt from @sql;
    execute stmt;
    deallocate prepare stmt; 
    

    For your sample data, this produces the following query:

    select 
        date(created_at) date_bucket, 
        sum(case when event_type = 'exampleG1' then number else 0 end)/sum(number) as `ratio_exampleG1`,
        sum(case when event_type = 'exampleG2' then number else 0 end)/sum(number) as `ratio_exampleG2`,
        sum(case when event_type = 'exampleG3' then number else 0 end)/sum(number) as `ratio_exampleG3` 
    from example_table 
    group by date(created_at) 
    order by date_bucket
    

    And the following result:

    date_bucket | ratio_exampleG1 | ratio_exampleG2 | ratio_exampleG3
    :---------- | --------------: | --------------: | --------------:
    2020-06-02  |          0.1429 |          0.2857 |          0.5714
    2020-06-03  |          1.0000 |          0.0000 |          0.0000
    

    Demo on DB Fiddle