Search code examples
sqlpostgresqlaggregategreatest-n-per-group

Group and count by another columns value


I have a table like below:

CREATE TABLE public.test_table
(
    "ID" serial PRIMARY KEY NOT NULL,
    "CID" integer NOT NULL,
    "SEG" integer NOT NULL,
    "DDN" character varying(3) NOT NULL
)

and data looks like this:

ID  CID SEG DDN
1   1   1   "711"
2   1   2   "800"
3   1   3   "124"
4   2   1   "711"
5   3   1   "711"
6   3   2   "802"
7   4   1   "799"
8   5   1   "799"
9   5   2   "804"
10  6   1   "799"

I need to group these data by CID column and get column counts depends on DDN columns first values but counts must give me two different information, if it's more than 1 or not.

I'm really sorry if couldn't explains clearly. Let me show you what I need..

DDN END TRA
711 1   2
799 2   1

As you can see, DDN:711 has 1 record of single count (ID:4). This is END column. But 2 times has multiple SEG count (ID:1to3 and ID:5to6). This is TRA column.

I can not be sure what column should be in group clause!

My solution:

Just found a solution like below

WITH x AS (
    SELECT
        (SELECT t1."DDN" FROM public.test_table AS t1
         WHERE t1."CID"=t."CID" AND t1."SEG"=1) AS ddn,
        COUNT("CID") AS seg_count       
    FROM public.test_table AS t
    GROUP BY "CID"
)

SELECT ddn, COUNT(seg_count) AS "TOTAL",    
    SUM(CASE WHEN x.seg_count=1 THEN 1 ELSE 0 END) as "END",
    SUM(CASE WHEN x.seg_count>1 THEN 1 ELSE 0 END) as "TRA"
FROM x
GROUP BY ddn;

Solution

  • Equivalent, faster query:

    SELECT "DDN"
         , COUNT(*) AS "TOTAL"
         , COUNT(*) FILTER (WHERE seg_count = 1) AS "END"
         , COUNT(*) FILTER (WHERE seg_count > 1) AS "TRA"
    FROM  (
       SELECT DISTINCT ON ("CID")
             "DDN"          -- assuming min "SEG" is always 1
           , COUNT(*) OVER (PARTITION BY "CID") AS seg_count
       FROM   test_table
       ORDER  BY "CID", "SEG"
       ) sub
    GROUP  BY "DDN";
    

    db<>fiddle here

    Notes

    CTEs are typically slower and should only be used where needed in Postgres.

    This is equivalent to the query in the question assuming that the minimum "SEG" per "CID" is always 1 - since this query returns the row with the minimum "SEG" while your query returns the one with "SEG" = 1. Typically, you would want the "first" segment and my query implements this requirement more reliably, but that's not clear from the question.

    COUNT(*) is slightly faster than COUNT(column) and equivalent while not involving NULL values (applicable here). Related:

    About DISTINCT ON:

    The aggregate FILTER syntax requires Postgres 9.4+: