sql ruby postgresql data-modeling sequel

Moving table columns to new table and referencing as foreign key in PostgreSQL

Suppose we have a DB table with fields

"id", "category", "subcategory", "brand", "name", "description", etc.

What's a good way of creating separate tables for category, subcategory and brand and the corresponding columns and rows in the original table becoming foreign key references?

To outline the operations involved:

get all unique values in each column of the original table which should become foreign keys;
create tables for those
create foreign key reference columns in the original table (or a copy)

In this case, the PostgreSQL DB is accessed via Sequel in a Ruby app, so available interfaces are the command line, Sequel, PGAdmin, etc...

The question: how would you do this?

Solution

        -- Some test data
CREATE TABLE animals
        ( id SERIAL NOT NULL PRIMARY KEY
        , name varchar
        , category varchar
        , subcategory varchar
        );
INSERT INTO animals(name, category, subcategory) VALUES
 ( 'Chimpanzee' , 'mammals', 'apes' )
,( 'Urang Utang' , 'mammals', 'apes' )
,( 'Homo Sapiens' , 'mammals', 'apes' )
,( 'Mouse' , 'mammals', 'rodents' )
,( 'Rat' , 'mammals', 'rodents' )
        ;

        -- [empty] table to contain the "squeezed out" domain
CREATE TABLE categories
        ( id SERIAL NOT NULL PRIMARY KEY
        , category varchar
        , subcategory varchar
        , UNIQUE (category,subcategory)
        );

        -- The original table needs a "link" to the new table
ALTER TABLE animals
        ADD column category_id INTEGER -- NOT NULL
        REFERENCES categories(id)
        ;
        -- FK constraints are helped a lot by a supportive index.
CREATE INDEX animals_categories_fk ON animals (category_id);

        -- Chained query to:
        -- * populate the domain table
        -- * initialize the FK column in the original table
WITH ins AS (
        INSERT INTO categories(category, subcategory)
        SELECT DISTINCT a.category, a.subcategory
        FROM animals a
        RETURNING *
        )
UPDATE animals ani
SET category_id = ins.id
FROM ins
WHERE ins.category = ani.category
AND ins.subcategory = ani.subcategory
        ;

        -- Now that we have the FK pointing to the new table,
        -- we can drop the redundant columns.
ALTER TABLE animals DROP COLUMN category, DROP COLUMN subcategory;

        -- show it to the world
SELECT a.*
        , c.category, c.subcategory
FROM animals a
JOIN categories c ON c.id = a.category_id
        ;

Note: the fragment:

WHERE ins.category = ani.category AND ins.subcategory = ani.subcategory

will lead to problems if these columns contain NULLs. It would be better to compare them using

(ins.category,ins.subcategory) IS NOT DISTINCT FROM (ani.category,ani.subcategory)