Search code examples
sqlpostgresqlgroup-bypivotrdbms

Solving Complex SQL-Based Aggregation problem with multiple events


Long story short I failed the following test assessment a while ago. Want to finally understand how to solve it.

player_id : int game_id : int event_type : varchar(64) event_time : timestamp

store_game_install: the specified player has performed an initial install of the specified game

store_game_update: the specified player has performed an update of the specified game

store_open: the specified player has launched the game store client on the device store_deleted: the specified player has uninstalled the game store client on the device store_game_download: the specified player has downloaded the game package onto Since actual game install has to be inferred from a store_game_view followed by a store_game_download.

On the other hand, later versions of the store client have stopped producing the store_game_download event.

I had to construct a single MySql/PostgreSQL statement producing the following aggregation:

game_id => game id

game_views => Total count of game detail views for this

game_installs => Total count of game installs for this game

ANY help would mean the world. Thanks!


Solution

  • You can do conditional aggregation.

    In Postgres:

    select
        app_id,
        count(*) filter(where event_type = 'store_app_view') app_views,
        count(*) filter(where event_type = 'store_app_install') app_installs
        count(*) filter(where event_type = 'store_app_install')
            / count(*) filter(where event_type = 'store_app_view') app_conversion_rate
    from events_db
    group by app_id
    

    Implementing the logic to infer store installs for old versions of the app is a bit more complex. For this, one method is to use window functions:

    select
        app_id,
        count(*) filter(where event_type = 'store_app_view') app_views,
        count(*) filter(where event_type = 'store_app_install' or event_type = 'store_app_view' and lead_event_type = 'store_app_download') app_installs
        count(*) filter(where event_type = 'store_app_install' or event_type = 'store_app_view' and lead_event_type = 'store_app_download')
            / count(*) filter(where event_type = 'store_app_View') app_conversion_rate
    
    from (
        select
            e.*,
            lead(event_type) over(partition by user_id order by event_type) lead_event_type
        from events_db e
    ) t
    group by app_id