Search code examples
sqlpostgresqlviewquery-optimizationmaterialized-views

Postgresql View aggregating data


I've got a problem to make this "nice" and efficient, as well as easy to read. Unfortunately it's lacking both properties.

Given a table with date,transaction_type,username and credits, I want to generate a view which summarizes into these fields: date_from,date_next,username,credits_total,credits_total

Explanation:

  • date_from and date_to are this first of a month and the next first of a month. (e.g. 2022-06-01 and 2022-07-01)
  • username is grouped and so only one same username per date_from/date_next pair
  • credits_total is the sum of credit_change where transaction = 'charge'
  • credits_left is credits_total - sum(credits_change where transaction_type = 'usage')

I've identified multiple problems and were partially able to fix them:

  • date_from/_next is easy with date_trunc('month', date) and `date_trunc('month', date) + interval '1 month''
  • group username/dates is possible with group by
  • making credits_total without duplication is hard. Or is subquery the only solution?
  • credits left is almost the same, but with other transaction_type and subtracting from credits_total. How Can I reuse the credits_total?

What I came up with (and am very unhappy with)

The source table:

create table usage  -- simplified
(
    datetime timestamp default now() not null,
    transaction_type varchar(16) not null,
    user varchar(128) not null,
    credits_change int not null,
);

My code for the view:

CREATE MATERIALIZED VIEW token_usage
AS
SELECT 
       -- trivial:
       user,
       date_trunc('month', datetime) as date_from,
       date_trunc('month', datetime) + interval '1 month' as date_next,

       -- sum of credits_change with requirement + duplication
       (    -- see here. first time
            SELECT sum(credits_change)
            FROM usage
            WHERE transaction_type = 'charge'
            AND datetime BETWEEN date_trunc('month', datetime) AND date_trunc('month', datetime) + interval '1 month'
       ) as credits_total,

       -- sum of credits change minus other sum and more duplication
       (   -- see here. using the same again
           SELECT sum(credits_change)
           FROM usage
           WHERE transaction_type = 'charge'
           AND datetime BETWEEN date_trunc('month', datetime) AND date_trunc('month', datetime) + interval '1 month'
       ) - (  -- see here. using the same again, but with different transaction_type
           SELECT sum(credits_change)
           FROM usage
           WHERE transaction_type = 'usage'
           AND datetime BETWEEN date_trunc('month', datetime) AND date_trunc('month', datetime) + interval '1 month'
       ) as credits_left
    FROM usage
    GROUP BY user_name, datetime, datetime_next_start
WITH DATA;

Tbh it seems I'm just missing some postgresql tool, to make this better.

Thanks for the help :)


Solution

  • Without knowing some sample data and expected output to try the query, the following can surely taken as a sketch for your complete solution. I guess, the main point here is knowing about the FILTER clause for aggregation functions(*):

    CREATE MATERIALIZED VIEW token_usage AS
    
    SELECT
        user,
        date_trunc('month', datetime) as date_from,
        date_trunc('month', datetime) + interval '1 month' as date_next,
        
        SUM(credits_change) FILTER (WHERE transaction_type = 'charge') as credits_total,
        SUM(credits_change) FILTER (WHERE transaction_type = 'charge')
            - SUM(credits_change) FILTER (WHERE transaction_type = 'usage') as credits_left
    FROM usage
    GROUP BY 1, 2, 3
    

    Alternative with less duplication but maybe less readable due to subquery:

    CREATE MATERIALIZED VIEW token_usage AS
    
    SELECT 
        user,
        date_from,
        date_from + interval '1 month' as date_next,
        credits_total,
        credits_total - credits_usage as credits_left
    FROM (
        SELECT
            user,
            date_trunc('month', datetime) as date_from,
        
            SUM(credits_change) FILTER (WHERE transaction_type = 'charge') as credits_total,
            SUM(credits_change) FILTER (WHERE transaction_type = 'usage') as credits_usage
        FROM usage
        GROUP BY 1, 2 
    ) s
    

    *) You can use a CASE clause instead of the FILTER as well:

    SUM(abc) FILTER (WHERE condition)
    
    -- generally the same as
    
    SUM(
        CASE WHEN condition THEN 
            abc 
        END
    )