Search code examples
mysqlsqlpsqlpostgrest

PSQL : How to get the record count of each value in a column group by another column


PSQL: My main table has 20 million records and when I run my select it runs for hours. Is there a way to better write this statment ?

My table :

Select * from lookup limit 10;
-------------------------
month      id
2010-01     598362
2010-01     598343
2010-02     598343
2010-02     988343
2010-03     789624
2010-04     789624
2010-05     789624
2010-06     899624 

From the table I am trying to find

  1. Count of distinct ids for that month
  2. Count of distinct id's that were also present in previous 2 months

My select statement (below) works fine with small data (upto 100,000 records)

--PSQL
select  month, 
    count (distinct id)id_ct,
    count (distinct case when (
                    id||(month-1) in (select distinct id||month from lookup ) 
                or  id||(month-2) in (select distinct id||month from lookup )  ) 
                    then id end) continuous_ct
    from lookup
    group by 1  order by 1

Result :

month     id_ct continuous_ct
 2010-01    2   0
 2010-02    2   1
 2010-03    1   0
 2010-04    1   1
 2010-05    1   1
 2010-06    1   0

Thanks!


Solution

  • If you have an index on the id then this simple query should work. Just left join the table to itself. http://sqlfiddle.com/#!15/133a9/7/0:

    select to_char(lookup_a.lookup_month, 'yyyy-mm'),
        count (distinct lookup_a.id) id_ct,
        count (distinct lookup_b.id) continuous_ct
    from lookup lookup_a
    left join lookup lookup_b 
        on lookup_b.id = lookup_a.id
        and lookup_b.lookup_month between lookup_a.lookup_month - interval '2 month'
        and lookup_a.lookup_month - interval '1 month'
    group by 1  
    order by 1