Search code examples
sqlpostgresqlpivotwindow-functions

Window function inside a pivot, is it possible?


I have this table

enter image description here

And I need to create a pivot, that displays emotions as columns, with average emotion_level grouped by user_id, user_date, emotion. For example, for user_id = 1, user_date = 2011-07-13 and emotion = 'Anger', the average emotion_level should be 4.0.

I create a pivot:

select USER_ID, user_date,  
AVG(case emotion when 'Anger' then convert(float, emotion_level) else 0 end)  as Anger,
AVG(case emotion when 'Sadness' then convert(float, emotion_level) else 0 end) as Sadness,
AVG(case emotion when 'Interest' then convert(float, emotion_level) else 0 end) as Interest
from emotions group by USER_ID, user_date;

Which half-works, but calculates average emotion_level among all emotions, but not for emotions grouped by user, date and emotion.

My result for first user + emotion = 'Anger' = 2, but it should be 4.

enter image description here

I guess, I should use the window function (over (partition by user_id, user_date, emotion)), but can't get the syntax run.

Is it possible at all?

I'm using PostgreSQL 9 in prod, but the above example is written in SQL Server.


Solution

  • WITH    q (id, user_id, user_date, emotion, emotion_level) AS
            (
            VALUES
            (1, 1, '2011-07-13'::DATE, 'Anger', 3),
            (2, 1, '2011-07-13'::DATE, 'Anger', 5),
            (3, 1, '2011-07-13'::DATE, 'Sadness', 2),
            (4, 1, '2011-07-13'::DATE, 'Interest', 2),
            (5, 2, '2011-07-13'::DATE, 'Anger', 1),
            (6, 2, '2011-07-13'::DATE, 'Sadness', 4),
            (7, 2, '2011-07-13'::DATE, 'Sadness', 5),
            (8, 2, '2011-07-13'::DATE, 'Interest', 3),
            (9, 3, '2011-07-13'::DATE, 'Anger', 1),
            (10, 3, '2011-07-13'::DATE, 'Sadness', 3),
            (11, 3, '2011-07-13'::DATE, 'Interest', 4),
            (12, 3, '2011-07-13'::DATE, 'Interest', 5)
            ) 
    SELECT  user_id, user_date,
            AVG(CASE emotion WHEN 'Anger' THEN emotion_level END)::numeric(3, 2) AS Anger,
            AVG(CASE emotion WHEN 'Sadness' THEN emotion_level END)::numeric(3, 2) AS Sadness,
            AVG(CASE emotion WHEN 'Interest' THEN emotion_level END)::numeric(3, 2) AS Interest
    FROM    q
    GROUP BY
            user_id, user_date
    ORDER BY
            user_id, user_date
    

    The problem was that the expressions you originally used:

    AVG(case emotion when 'Interest' then convert(float, emotion_level) else 0 end)
    

    averaged over all records for a given user at given date and treated non-Interest entries as 0, while they should be treated as NULL so they would not contribute to Interest average.