sql postgresql database-design many-to-many database-normalization

How to store array of tables in Schema

So I am a student currently learning about PostgreSQL. I am trying to figure out the way, how to randomly seed data. I have 10M users and 100 stocks.

Currently my tables will look like:

CREATE TABLE user (
  user_id INTEGER NOT NULL,   
  amount_of_stocks [][] array, -- this is just assumption
  PRIMARY KEY (user_id)
);

CREATE TABLE stock (
  stock_id INTEGER NOT NULL,   
  amount_per_stock INT,
  quantity INT
  PRIMARY KEY (stock_id)
);

How would I store 100 different stocks for each user?

Solution

Sounds like a classical many-to-many relationship. Should not involve arrays at all. Assuming Postgres 10 or later, use something along these lines:

CREATE TABLE users (  -- "user" is a reserved word!
  user_id  int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, username text UNIQUE NOT NULL  -- or similar
);

CREATE TABLE stock (
  stock_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, stock    text UNIQUE NOT NULL  -- or similar
);

CREATE TABLE user_stock (
  user_id  int REFERENCES users
, stock_id int REFERENCES stock
, amount   int NOT NULL
, PRIMARY KEY (user_id, stock_id)
);

Detailed explanation:

Seed

Postgres provides generate_series() to conveniently generate rows. random() is instrumental to generate random numbers:

INSERT INTO users(username)
SELECT 'user_' || g
FROM   generate_series(1, 10000000) g; -- 10M (!) - try with just 10 first

INSERT INTO stock(stock)
SELECT 'stock_' || g
FROM   generate_series(1, 100) g;

Experiment with a small number of users first. 10M users * 100 stocks generates a billion rows. Takes some time and occupies some space.

How would I store 100 different stocks for each user?

INSERT INTO user_stock
        (user_id,   stock_id, amount)
SELECT u.user_id, s.stock_id, ceil(random() * 1000)::int
FROM   users u, stock s;  -- cross join

Every user gets 100 different stocks - though everyone gets the same set in this basic example, you did not define more closely. I added a random amount per stock between 1 and 1000.

About the cross join to produce the Cartesian product:

What does [FROM x, y] mean in Postgres?