Search code examples
sqlpostgresqldatabase-designmany-to-manydatabase-normalization

How to store array of tables in Schema


So I am a student currently learning about PostgreSQL. I am trying to figure out the way, how to randomly seed data. I have 10M users and 100 stocks.

Currently my tables will look like:

CREATE TABLE user (
  user_id INTEGER NOT NULL,   
  amount_of_stocks [][] array, -- this is just assumption
  PRIMARY KEY (user_id)
);

CREATE TABLE stock (
  stock_id INTEGER NOT NULL,   
  amount_per_stock INT,
  quantity INT
  PRIMARY KEY (stock_id)
);

How would I store 100 different stocks for each user?


Solution

  • Sounds like a classical many-to-many relationship. Should not involve arrays at all. Assuming Postgres 10 or later, use something along these lines:

    CREATE TABLE users (  -- "user" is a reserved word!
      user_id  int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
    , username text UNIQUE NOT NULL  -- or similar
    );
    
    CREATE TABLE stock (
      stock_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
    , stock    text UNIQUE NOT NULL  -- or similar
    );
    
    CREATE TABLE user_stock (
      user_id  int REFERENCES users
    , stock_id int REFERENCES stock
    , amount   int NOT NULL
    , PRIMARY KEY (user_id, stock_id)
    );
    

    Detailed explanation:

    Seed

    Postgres provides generate_series() to conveniently generate rows. random() is instrumental to generate random numbers:

    INSERT INTO users(username)
    SELECT 'user_' || g
    FROM   generate_series(1, 10000000) g; -- 10M (!) - try with just 10 first
    
    INSERT INTO stock(stock)
    SELECT 'stock_' || g
    FROM   generate_series(1, 100) g;
    

    Experiment with a small number of users first. 10M users * 100 stocks generates a billion rows. Takes some time and occupies some space.

    How would I store 100 different stocks for each user?

    INSERT INTO user_stock
            (user_id,   stock_id, amount)
    SELECT u.user_id, s.stock_id, ceil(random() * 1000)::int
    FROM   users u, stock s;  -- cross join
    

    Every user gets 100 different stocks - though everyone gets the same set in this basic example, you did not define more closely. I added a random amount per stock between 1 and 1000.

    About the cross join to produce the Cartesian product: