I have two tables tb1 and tb2. tb2 is connected to tb1 via a foreign key named tk_id. Here is how my two tables look like
tb1
Column | Type | Collation | Nullable | Default
-----------------+-----------------------------+-----------+----------+--------------------------------------------------
id | integer | | not null | nextval('tb2_id_seq'::regclass)
created_at | timestamp without time zone | | not null |
modified_at | timestamp without time zone | | not null |
status | double precision | | not null |
tk_id | uuid | | not null |
tb2
Column | Type | Collation | Nullable | Default
----------------+-----------------------------+-----------+----------+---------
id | uuid | | not null |
created_at | timestamp without time zone | | not null |
modified_at | timestamp without time zone | | not null |
destination_id | uuid | | not null |
source_id | uuid | | not null |
tk_id | uuid | | not null |
Now I need to get all rows from tb1 which has columns from both tb1 and tb2 when the tk_id value matches for both the rows.
This is what I tried:
select tb1.created_at, tb1.status, tb2.source_id, tb2.destination_id from tb1
inner join tb2 on tb1.tk_id = tb2.tk_id where
tb1.created_at > timezone('utc', now()) - interval '40 minutes';
But I am getting way too many rows. Usually in a 40min interval, there would be around 800 records but after the join, I am getting around 100,000+ records.
EDIT: After some reading and a few tries, I made some changes in my query and managed to get the rows down to the expected number of rows. This is my query now
SELECT count(*) FROM tb1 LEFT OUTER JOIN (SELECT DISTINCT tk_id FROM tb2) t2
ON tb1.tk_id = t2.tk_id where tb1.created_at > timezone('utc', now()) -
interval '40 minutes';
But now I can't get the columns of tb2 in my select query.
What am I doing wrong?
EDIT2: Sorry if I couldn't make this clearer earlier. The join condition should be based on the latest occurrence of tk_id of the right table. So for every row of left table (tb1), it should match against the latest occurrence of tk_id of right table (tb2) and fetch the right table's columns.
Since there are obviously many rows in tb2
for a single row in tb1
, you need to define which row to pick. Or some aggregation?
This query returns all qualifying rows from tb1
, and adds columns from the one matching row in tb2
that was created last (if any):
SELECT tb1.created_at, tb1.status, tb2.*
FROM tb1
LEFT JOIN LATERAL (
SELECT tb2.source_id, tb2.destination_id
FROM tb2
WHERE tb2.tk_id = tb1.tk_id
ORDER BY created_at DESC, id DESC
LIMIT 1
) tb2 ON true
WHERE tb1.created_at > timezone('utc', now()) - interval '40 minutes';
Added id DESC
as tiebreaker since created_at
might not be unique.
Adapt to your undisclosed needs.
Related:
Support this query with an index on tb2(tk_id, created_at, id)
.
And another index on tb1(created_at)
, obviously. Or even a covering index on tb1(created_at) INCLUDE (tk_id, status)
. See: