Search code examples
sqlpostgresql

show top 10 occurring values in column


Looking to know how to query the top 10 occurrences of a value in a column .

Schema as follows:

CREATE TABLE Donations (
  id varchar(255) PRIMARY KEY,
  charityId varchar(255) NOT NULL,
  amount integer, 
  createdAt timestamp,
  updatedAt timestamp
);

Imagine this table has 1m records. Looking to know how I can get an output that would return rows that contain the the top 10 charityId's across all records; along with the total number of records that contain that charityId.

Example data would be something like the following:

| id   | charity id  | amount | created at                   | updated at                   |
|------|-------------|--------|------------------------------|------------------------------|
| "1"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "2"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "3"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "4"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "5"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "6"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "7"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "8"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "9"  | "charity-1" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "10" | "charity-2" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "11" | "charity-2" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "12" | "charity-2" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "13" | "charity-2" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "14" | "charity-3" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "15" | "charity-3" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "16" | "charity-3" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "17" | "charity-4" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "18" | "charity-4" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "19" | "charity-5" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "20" | "charity-5" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |
| "21" | "charity-6" | 100    | "2022-03-21 04:08:53.848451" | "2022-03-21 04:08:53.848451" |

An ideal output would be something like:

| charity id  | occurrences                  |
|-------------|------------------------------|
| "charity-1" | 9                            |
| "charity-2" | 4                            |
| "charity-3" | 3                            |
| "charity-4" | 2                            |
| "charity-5" | 2                            |

Solution

  • Since no tiebreaker has been defined, return results "with ties":

    SELECT charityid, count(*) AS occurrencs
    FROM   donations
    GROUP  BY charityid
    ORDER  BY occurrencs DESC  -- no tiebeaker?
    FETCH  FIRST 10 ROWS WITH TIES;
    

    See:

    And use count(*) here. Faster, and equivalent while no null values are involved.