I have three tables:
create table genres
(
genre_id serial primary key,
genre_name varchar NOT NULL UNIQUE
);
create table movies
(
movie_id serial primary key,
movie_name varchar NOT NULL
);
create table movie_genres
(
movie_id integer references movies NOT NULL,
genre_id integer references genres NOT NULL,
PRIMARY KEY(movie_id, genre_id)
);
Tables genres and movies are full of data and I want to generate some random data for table movie_genres, so that every movie has at least one genre. I tried it this way, but then it is possible for a movie to be without any genre. Can anyone help me with that, please?
insert into movie_genres
select movie_id, genre_id
from genres cross join movies
where random() < 0.15;
Hmm, you can try to join a derived table in which you first select one random genre and then UNION
some more randomly.
INSERT INTO movie_genres
(movie_id,
genre_id)
SELECT m.movie_id,
rg.genre_id
FROM movies m
CROSS JOIN ((SELECT g.genre_id
FROM genres g
ORDER BY random()
LIMIT 1)
UNION
(SELECT g.genre_id
FROM genres g
WHERE random() < 0.15)) rg;
That however means that every movie has that one genre selected first. To overcome this and have the first genre be random per movie, a lateral join can be used. (Remark: You need to use some column from the outer table in the derived table as otherwise the optimizer seems to optimize the LATERAL
away.)
INSERT INTO movie_genres
(movie_id,
genre_id)
SELECT rg.movie_id,
rg.genre_id
FROM movies m
CROSS JOIN LATERAL ((SELECT g.genre_id,
m.movie_id -- that's just here to force the optimizer to keep the join lateral
FROM genres g
ORDER BY random()
LIMIT 1)
UNION
(SELECT g.genre_id,
m.movie_id
FROM genres g
WHERE random() < 0.15)) rg;