Search code examples
sqlrubypostgresqldata-modelingsequel

Moving table columns to new table and referencing as foreign key in PostgreSQL


Suppose we have a DB table with fields

"id", "category", "subcategory", "brand", "name", "description", etc. 

What's a good way of creating separate tables for category, subcategory and brand and the corresponding columns and rows in the original table becoming foreign key references?

To outline the operations involved:

  • get all unique values in each column of the original table which should become foreign keys;
  • create tables for those
  • create foreign key reference columns in the original table (or a copy)

In this case, the PostgreSQL DB is accessed via Sequel in a Ruby app, so available interfaces are the command line, Sequel, PGAdmin, etc...

The question: how would you do this?


Solution

  •         -- Some test data
    CREATE TABLE animals
            ( id SERIAL NOT NULL PRIMARY KEY
            , name varchar
            , category varchar
            , subcategory varchar
            );
    INSERT INTO animals(name, category, subcategory) VALUES
     ( 'Chimpanzee' , 'mammals', 'apes' )
    ,( 'Urang Utang' , 'mammals', 'apes' )
    ,( 'Homo Sapiens' , 'mammals', 'apes' )
    ,( 'Mouse' , 'mammals', 'rodents' )
    ,( 'Rat' , 'mammals', 'rodents' )
            ;
    
            -- [empty] table to contain the "squeezed out" domain
    CREATE TABLE categories
            ( id SERIAL NOT NULL PRIMARY KEY
            , category varchar
            , subcategory varchar
            , UNIQUE (category,subcategory)
            );
    
            -- The original table needs a "link" to the new table
    ALTER TABLE animals
            ADD column category_id INTEGER -- NOT NULL
            REFERENCES categories(id)
            ;
            -- FK constraints are helped a lot by a supportive index.
    CREATE INDEX animals_categories_fk ON animals (category_id);
    
            -- Chained query to:
            -- * populate the domain table
            -- * initialize the FK column in the original table
    WITH ins AS (
            INSERT INTO categories(category, subcategory)
            SELECT DISTINCT a.category, a.subcategory
            FROM animals a
            RETURNING *
            )
    UPDATE animals ani
    SET category_id = ins.id
    FROM ins
    WHERE ins.category = ani.category
    AND ins.subcategory = ani.subcategory
            ;
    
            -- Now that we have the FK pointing to the new table,
            -- we can drop the redundant columns.
    ALTER TABLE animals DROP COLUMN category, DROP COLUMN subcategory;
    
            -- show it to the world
    SELECT a.*
            , c.category, c.subcategory
    FROM animals a
    JOIN categories c ON c.id = a.category_id
            ;
    

    Note: the fragment:

    WHERE ins.category = ani.category AND ins.subcategory = ani.subcategory

    will lead to problems if these columns contain NULLs. It would be better to compare them using

    (ins.category,ins.subcategory) IS NOT DISTINCT FROM (ani.category,ani.subcategory)