sql postgresql aggregate greatest-n-per-group

Group and count by another columns value

I have a table like below:

CREATE TABLE public.test_table
(
    "ID" serial PRIMARY KEY NOT NULL,
    "CID" integer NOT NULL,
    "SEG" integer NOT NULL,
    "DDN" character varying(3) NOT NULL
)

and data looks like this:

ID  CID SEG DDN
1   1   1   "711"
2   1   2   "800"
3   1   3   "124"
4   2   1   "711"
5   3   1   "711"
6   3   2   "802"
7   4   1   "799"
8   5   1   "799"
9   5   2   "804"
10  6   1   "799"

I need to group these data by CID column and get column counts depends on DDN columns first values but counts must give me two different information, if it's more than 1 or not.

I'm really sorry if couldn't explains clearly. Let me show you what I need..

DDN END TRA
711 1   2
799 2   1

As you can see, DDN:711 has 1 record of single count (ID:4). This is END column. But 2 times has multiple SEG count (ID:1to3 and ID:5to6). This is TRA column.

I can not be sure what column should be in group clause!

My solution:

Just found a solution like below

WITH x AS (
    SELECT
        (SELECT t1."DDN" FROM public.test_table AS t1
         WHERE t1."CID"=t."CID" AND t1."SEG"=1) AS ddn,
        COUNT("CID") AS seg_count       
    FROM public.test_table AS t
    GROUP BY "CID"
)

SELECT ddn, COUNT(seg_count) AS "TOTAL",    
    SUM(CASE WHEN x.seg_count=1 THEN 1 ELSE 0 END) as "END",
    SUM(CASE WHEN x.seg_count>1 THEN 1 ELSE 0 END) as "TRA"
FROM x
GROUP BY ddn;

Solution

Equivalent, faster query:

SELECT "DDN"
     , COUNT(*) AS "TOTAL"
     , COUNT(*) FILTER (WHERE seg_count = 1) AS "END"
     , COUNT(*) FILTER (WHERE seg_count > 1) AS "TRA"
FROM  (
   SELECT DISTINCT ON ("CID")
         "DDN"          -- assuming min "SEG" is always 1
       , COUNT(*) OVER (PARTITION BY "CID") AS seg_count
   FROM   test_table
   ORDER  BY "CID", "SEG"
   ) sub
GROUP  BY "DDN";

db<>fiddle here

Notes

CTEs are typically slower and should only be used where needed in Postgres.

This is equivalent to the query in the question assuming that the minimum "SEG" per "CID" is always 1 - since this query returns the row with the minimum "SEG" while your query returns the one with "SEG" = 1. Typically, you would want the "first" segment and my query implements this requirement more reliably, but that's not clear from the question.

COUNT(*) is slightly faster than COUNT(column) and equivalent while not involving NULL values (applicable here). Related:

PostgreSQL: running count of rows for a query 'by minute'

About DISTINCT ON:

Select first row in each GROUP BY group?

The aggregate FILTER syntax requires Postgres 9.4+:

Conditional SQL count