Search code examples
sqlsnowflake-cloud-data-platformgrouping

How to group states into regions when there is no region table


My data set has states as a column but no region column to group the states by. I would like to group the states into standard census bureau regions to get the count of employee IDs by region:

enter image description here

Select COUNT(DISTINCT Empl_ID) AS Employee_Count,
STATE
FROM Employee_Table
GROUP BY STATE

I tried exporting the query and then sorting in Excel but the dataset is too large for Excel.


Solution

  • Create a table based on the data provided in the image:

    create table region_state (region varchar(30), state varchar(30));
    insert into region_state values
    ('Northen Region', 'New Jersey'),
    ('Northen Region', 'New York'),
    ('Midwest Region', 'Illinois')
    ...;
    

    Now we can use a join query to get the region for each employee state as the following:

    Select COUNT(DISTINCT E.Empl_ID) AS Employee_Count,
           R.region
    FROM Employee_Table E JOIN region_state R
    ON E.state = R.state
    GROUP BY R.region
    

    If there are no duplicates in the Empl_ID column (and that is the typical case, i.e. Empl_ID is the primary key of the table) you would use COUNT(E.Empl_ID), with no need for DISTINCT.