I am trying to create a dimension table dim_airport from a raw_data table. The raw table has 2 airportcode columns (origin and destination). I want to get an exhaustive distinct set of all airport codes and create a table with that. In addition, I also need to add a serial key to perform auto increment. I can do this row_number but wanted to find another way to do it.
create table DIM_airport (airportkey serial primary key )
as
select distinct originairportcode as airportcode,
origairportname as airportname,
origincityname as city from raw_data
union
select distinct destairportcode, destairportname, destcityname
from raw_data;
If i defined a row_number window function over airportcode it works. I am looking for a solution where it can auto increment directly without explicitly defining the value from row_number()
The syntax of "Create table ... As ..." does not allow defining additional columns, it defines only columns from the select statement. Of course you can persist by selecting a constant as a place holder and establish the column name. Then create a sequence, update the table to set value for the place holder and then a few alter tables to complete the desired definition.
A much easier way would just be a 2 step process:
create table DIM_airport( airportkey integer generated always as identity
, airportcode text
, airportname text
, city text
, constraint DIM_airport_pk
primary key (airportkey)
) ;
insert into DIM_airport(airportcode, airportname, city)
select originairportcode
, origairportname
, origincityname
union
select destairportcode
, destairportname
, destcityname ;
You do not need DISTINCT on either select as the UNION itself eliminates duplicates. See examples here.