Search code examples
postgresqlgroup-bycalendarcross-joinnot-exists

Postgres SQL query grouped by gender. How to show values where a row does not exist?


I have a query that counts total messages sent by day, and groups them by gender.

If on a day there were no messages sent to males, there will be no row for male.

How do I insert a count value of 0 for males in this case?

Query:

SELECT d.date dated, count(se.id), gender
FROM    (
        select to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD') AS date
        FROM generate_series(0, 365, 1) AS offs
        ) d
    LEFT OUTER JOIN (
        SELECT id, customer_id, client_report.insert_time, profile.gender, profile.house_income, profile.address AS postcode, profile.age AS age_group, profile.is_employed, profile.is_married, profile.no_children, profile.no_cars, profile.shopping_frequency
        FROM common.client_report
        JOIN common.profile
        ON client_report.profile_id = profile.uuid 
        WHERE sms_status = 'SUCCESS' 
        ) se
ON (d.date=to_char(date_trunc('day', se.insert_time), 'YYYY-MM-DD'))
WHERE customer_id = 25::int AND d.date::date BETWEEN '2017-07-03'::date AND '2017-08-01'::date
GROUP BY d.date, gender
ORDER BY d.date ASC

I want the results to be consistent, such as:

    day, count, gender
    2017-07-01, 10, Female
    2017-07-01, 5, Male
    2017-07-02, 0, Female
    2017-07-02, 8, Male

Solution

  • For the sake of simplicity, let's assume you've made a table with genders:

    CREATE TABLE genders
    (
        gender text primary key
    ) ;
    
    INSERT INTO genders (gender)
    VALUES
       ('MALE'),
       ('FEMALE') ;
    

    If you want all dates/genders in your answer, you need to make a cross join of all possible dates and genders, then LEFT JOIN it with the data that actually answers your question:

    SELECT
        calendar.cal_date AS date, coalesce(gender_count, 0) AS gender_count, genders.gender
    FROM
        -- Virtual table with all dates/genders
        genders
        CROSS JOIN
        (SELECT 
            generate_series(/*now()*/ date '2017-07-10' - interval '7 days' /* or '1 year' */, 
                            /*now()*/ date '2017-07-10', 
                            interval '1 day')::date AS cal_date
        ) AS calendar
    
        -- The table with the actual data
        LEFT JOIN
        (
        SELECT
            date, gender, count(gender) AS gender_count
        FROM
            client_report
            JOIN profile ON profile.uuid = client_report.profile_id
        WHERE
            -- conditions to actually select data
            customer_id = 25
            AND sms_status = 'SUCCESS'
        GROUP BY
            date, gender
        ) AS counts 
            ON counts.date = calendar.cal_date AND counts.gender = genders.gender
    ORDER BY
        calendar.cal_date, genders.gender ;
    

    You'll get an answer like:

    date       | gender_count | gender
    :--------- | -----------: | :-----
    2017-07-03 |            1 | FEMALE
    2017-07-03 |            2 | MALE  
    2017-07-04 |            2 | FEMALE
    2017-07-04 |            0 | MALE  
    2017-07-05 |            0 | FEMALE
    2017-07-05 |            0 | MALE  
    2017-07-06 |            0 | FEMALE
    2017-07-06 |            2 | MALE  
    2017-07-07 |            2 | FEMALE
    2017-07-07 |            0 | MALE  
    2017-07-08 |            0 | FEMALE
    2017-07-08 |            1 | MALE  
    2017-07-09 |            0 | FEMALE
    2017-07-09 |            0 | MALE  
    2017-07-10 |            0 | FEMALE
    2017-07-10 |            0 | MALE  
    

    Note that I haven't taken a full year, but just 8 days.

    You can see the full example with sample data at dbfiddle here

    NOTE: I have ignored the insert_date, and concentrated only on the date, because I don't know which exact meaning each has. I think your point was how to use a cross join, the rest is not that relevant to the scenario.