Search code examples
postgresqlnullgrouping-sets

How to remove null values from a postgres grouping sets query


I wanted to do multiple independent group-bys in a single query, so that I could get the group wise count of different columns in a single query. I achieved that using the following sample query:

SELECT  model, count(model), os_version,count(os_version), timezone, count(timezone)
FROM device
GROUP BY
GROUPING SETS(
(model),(os_version),(timezone)
);

The sample result of the query is as follows:

"model" : "model A", "count" : 14, "os_version" : null, "count" : 14, "timezone" : null, "count" : 14
"model" : "model B", "count" : 6, "os_version" : null, "count" : 6, "timezone" : null, "count" : 6
"model" : null, "count" : 12, "os_version" : "Win 10", "count" : 12, "timezone" : null, "count" : 12
"model" : null, "count" : 8, "os_version" : "Mac OS", "count" : 8, "timezone" : null, "count" : 8
"model" : null, "count" : 5, "os_version" : null, "count" : 5, "timezone" : "GMT", "count" : 5
"model" : null, "count" : 15, "os_version" : null, "count" : 15, "timezone" : "EST", "count" : 15

From the result, you can see that it first groups by on the model column, then os_version and then timezone. But in this process, it also introduces null values for the columns on which the grouping is not currently happening.

I wanted know if there is any way to get the results without any null values in them without writing any custom parser. For example, the target result I am expecting is:

"model" : "model A","count" : 14,
"model" : "model B", "count" : 6,
"os_version" : "Win 10", "count" : 12,
"os_version" : "Mac OS", "count" : 8,
"timezone" : "GMT", "count" : 5,
"timezone" : "EST", "count" : 15

Solution

  • You can wrap the query with an outer query:

    SELECT CASE WHEN model      IS NOT NULL THEN 'model'
                WHEN os_version IS NOT NULL THEN 'os_version'
                WHEN timezone   IS NOT NULL THEN 'timezone'
           END AS category,
           coalesce(model, os_version, timezone) AS value,
           count
    FROM (SELECT model, os_version, timezone, count(*)
          FROM device
          GROUP BY GROUPING SETS ((model),(os_version),(timezone))
         ) AS q;