Search code examples
postgresqlcountcategorical-data

How effeciently count categories (250) of a categorical attribute? PostgreSQL or Python


I have a big database with 50 attributes (8 categorical) and I need to create a summary with a count of all categories of each variable grouped by city and state. One of the attributes has over 250 categories.

So far I was able to create a query that counts one category per attribute at the time grouped by city and exported to csv.

(select city as "City", COUNT(use4) as "use2056"
from demo
where use4 = '2056'
group by city
order by city asc)

I was thinking about manually copying and pasting (I know it will take forever) but I get outputs with different rows. Also, there are cities with the US with the same name (I will eventually need to visualize it). I tried to use several SELECT per query but I cannot make it work.


Select
(select city as "City", COUNT(use4) as "use2056"
from demo
where use4 = '2056'
group by city
order by city asc),
(COUNT(use4) as "use2436"
from demo
where use4 = '2436'
group by city
order by city asc),
(COUNT(use4) as "use9133"
from demo
where use4 = '9133'
group by city
order by city asc)

I also tried to add the city and county and additional counts

(select zip as "ZIPCODE", city, county, COUNT(use4) as "Use4count1466", COUNT(use4) as "Use4count9133"
from demo
where use4 = '1466',
where use4 = '9133' 
group by zip, city, county
order by zip asc)

is there anyway to do this efficiently? create a loop that keeps counting every category of each attribute? How many SELECT can you have in a query? I need to find a way to display zipcode, county, city and count all the categories of each categorical attribute.


Solution

  • You can use filtered aggregation to do this in a single query:

    select city, 
           count(*) filter (where use4 = '2056') as use2056,
           count(*) filter (where use4 = '2436') as use2436,
           count(*) filter (where use4 = '9133') as use9133,
    from demo
    where use4 in ('2056', '2436', '9133')
    group by city;
    

    You can apply the same for the second query:

    select zip as "ZIPCODE", 
           city, 
           county, 
           count(*) filter (where use4 = '1466') as use4count1466, 
           count(*) filter (where use4 = '9133') as use4count9133
    from demo
    where use4 in ('1466','9133')
    group by zip, city, county