Search code examples
sqlpostgresqlcumulative-frequency

Need to get cumulative count distinct by periods (year,month)


I already managed to obtain a cumulative distinct count by months but I can't get it when trying to do it by years. I tried partition function but I keep getting all kinds of errors, this is what I got so far. Could you give me some guidance, Im new at this.

[SQL Fiddle][1]

PostgreSQL 9.3 Schema Setup:

create table table2 (id serial, fecha_captura date, mn integer, VALUACION1 integer);

    insert into table2 (fecha_captura, mn, VALUACION1)
    values
    ('1996-02-01',2,12345),
    ('1996-02-01',2,12345),
    ('1996-02-01',2,12344),
    ('1996-02-01',2,12343),
    ('1996-03-01',3,12345),
    ('1996-03-01',3,12346),
    ('1996-03-01',3,12347),
    ('1996-03-01',3,12348),
    ('1996-04-01',4,12349),
    ('1996-04-01',4,12350),
    ('1996-04-01',4,12351),
    ('1996-04-01',4,12352),
    ('1997-03-01',4,12353),
    ('1997-03-01',4,12354),
    ('1997-03-01',4,12345),
    ('1997-03-01',4,12352),
    ('1997-04-01',4,12353),
    ('1997-04-01',4,12354),
    ('1997-04-01',4,12345),
    ('1997-04-01',4,12355);

Query 1:

SELECT Year1,mn,
   COUNT(DISTINCT(VALUACION1)) AS Val_Dist_Mes
  ,SUM(flag) AS Val_Dist_Mes_Nuevas
  ,SUM(SUM(flag)) 
   OVER (partition by Year1 ORDER BY mn,Year1
         ROWS UNBOUNDED PRECEDING) AS Val_Dist_Acum 
FROM
 (
   SELECT
      VALUACION1
     ,EXTRACT(MONTH FROM fecha_captura) AS mn
     ,EXTRACT(YEAR FROM fecha_captura) AS Year1
     ,CASE
         WHEN EXTRACT(MONTH FROM fecha_captura) 
              = MIN (EXTRACT(MONTH FROM fecha_captura))
              OVER (PARTITION BY VALUACION1) 
         THEN 1 
         ELSE 0 
      END AS flag
   FROM table2
   GROUP BY fecha_captura,VALUACION1
 ) AS dt
GROUP BY mn,Year1

[Results][2]:

| year1 | mn | val_dist_mes | val_dist_mes_nuevas | val_dist_acum |
|-------|----|--------------|---------------------|---------------|
|  1996 |  2 |            3 |                   3 |             3 |
|  1996 |  3 |            4 |                   3 |             6 |
|  1996 |  4 |            4 |                   3 |             9 |
|  1997 |  3 |            4 |                   3 |             3 |
|  1997 |  4 |            4 |                   1 |             4 |

[EXPECTED RESULTS][2]:

| year1 | mn | val_dist_mes | val_dist_mes_nuevas | val_dist_acum  |
|-------|----|--------------|---------------------|----------------|
|  1996 |  2 |            3 |                   3 |             3  |
|  1996 |  3 |            4 |                   3 |             6  |
|  1996 |  4 |            4 |                   4 |             10 |
|  1997 |  3 |            4 |                   4 |             4  |
|  1997 |  4 |            4 |                   1 |             5  |

Solution

  • Adding Year in group by and partition by should fix the problem

    SELECT AÑO, 
           mn, 
           Count(DISTINCT( valuacion1 ))             AS Val_Dist_Mes, 
           Sum(flag)                                 AS Val_Dist_Mes_Nuevas, 
           Sum(Sum(flag)) 
             OVER ( 
               partition BY AÑO 
               ORDER BY mn rows UNBOUNDED PRECEDING) AS Val_Dist_Acum 
    FROM   (SELECT valuacion1, 
                   Extract(month FROM fecha_captura) AS mn, 
                   Extract(year FROM fecha_captura)  AS AÑO, 
                   CASE -- find the first year when a customer placed an order 
                     WHEN Extract(month FROM fecha_captura) = Min ( 
                          Extract(month FROM fecha_captura)) 
                          OVER ( 
                            partition BY valuacion1, Extract(year 
                          FROM fecha_captura)) THEN 1 
                     ELSE 0 
                   END                               AS flag 
            FROM   table2 
            GROUP  BY fecha_captura, 
                      valuacion1) AS dt 
    GROUP  BY mn, 
              AÑO