sql snowflake-cloud-data-platform grouping

How to group states into regions when there is no region table

My data set has states as a column but no region column to group the states by. I would like to group the states into standard census bureau regions to get the count of employee IDs by region:

Select COUNT(DISTINCT Empl_ID) AS Employee_Count,
STATE
FROM Employee_Table
GROUP BY STATE

I tried exporting the query and then sorting in Excel but the dataset is too large for Excel.

Solution

Create a table based on the data provided in the image:

create table region_state (region varchar(30), state varchar(30));
insert into region_state values
('Northen Region', 'New Jersey'),
('Northen Region', 'New York'),
('Midwest Region', 'Illinois')
...;

Now we can use a join query to get the region for each employee state as the following:

Select COUNT(DISTINCT E.Empl_ID) AS Employee_Count,
       R.region
FROM Employee_Table E JOIN region_state R
ON E.state = R.state
GROUP BY R.region

If there are no duplicates in the Empl_ID column (and that is the typical case, i.e. Empl_ID is the primary key of the table) you would use COUNT(E.Empl_ID), with no need for DISTINCT.