Search code examples
sqlcountamazon-redshiftaginity

How to get distinct counts for values in columns


I have a table and two of it's columns look like this:

Name        cars_owned
Adam        Audi
David       BMW
Steve       Ford
John        Audi
Adam        Audi
....This continues on for ~1,000 rows

I want an output like this:

cars_owned     count
Audi           3
BMW            1
Ford           1
...

The issue is rows can be repeated, meaning 'Adam' and 'Audi' might appear together 100 times but for the name 'Adam' I want it to just count as one for the Audi counts regardless of however many times it appears in the table. Additionally 'Adam' might appear with 'Audi' 20 times but also appear with 'BMW' 5 times, in this case I want 'Adam' to only get into the Audi count as that appears more times than with BMW

I was able to get a count of the distinct names (247) using this code:

SELECT COUNT(DISTINCT Name)
FROM cars_table

Now I just need to break those 247 distinct names into the correct cars_owned categories to get the counts


Solution

  • You want the most common value for each name and then the distribution. Use two levels of aggregation:

    select car, count(*) as num_names, sum(cnt) as num_cars
    from (select name, car, count(*) as cnt,
                 row_number() over (partition by name order by count(*) desc) as seqnum
          from cars c
          group by name, car
         ) nc
    where seqnum = 1
    group by car;
    

    Note: If there is a tie for the most common car for a name, this returns one arbitrary value.