I am having difficulty converting a view from T-SQL to Postgresql as it relates to the aggregate functions.
Here is my original SQL query:
SELECT TOP (100) PERCENT
thread_id,
MIN(message_id) AS message_id,
MIN(parent_message_id) AS parent_message_id,
MIN(created_at) AS initialResponse
FROM
dbo.bi_linear_thread
WHERE
LEFT([subject], 5) LIKE '%RE:%' AND parent_message_id IS NOT NULL
GROUP BY
thread_id
ORDER BY
thread_id
I have attempted to utilize the window function of:
first_value(message_id) OVER (Partition BY message_id ORDER BY messageid)
but continue to get incorrect returns.
Thoughts?
-- after the answer provided. Hopeful this is helpful to others.
Needed to read the first line in a group that excluded the first record in a hierarchal order of threads to replies.
thread_id
represents the thread.
message_id
represents any message type whether a reply/original message. parent_message_id
represents the original message in the thread.
"RE:" is a specified format for replies regardless whether nested or not in the subject field.
SELECT thread_id
,MIN(message_id) AS message_id
,MIN(parent_message_id) AS parent_message_id
,MIN(created_at) AS initialResponse
FROM dbo.bi_linear_thread
WHERE left(subject, 5) LIKE '%RE:%'
AND parent_message_id IS NOT NULL
GROUP BY thread_id
ORDER BY thread_id;
Except for removing the illegal brackets from [subject]
and also removing TOP (100) PERCENT
, which is just noise anyway, the query should work.
You might want ILIKE
instead of LIKE
for case-insensitive pattern-matching.
left()
was introduced with Postgres 9.1.
If you want CaMeL-case identifiers you need to double-quote: "initialResponse". Else they are lower-cased automatically. My advice is to use lower-case identifiers exclusively.
If you actually want the row with the earliest created_at
per thread_id
, which you cunningly kept a secret, your query is incorrect, tSQL or PostgreSQL alike.
You could use DISTINCT ON
for that.
You probably don't have a schema named dbo
in Postgres either, and that's an artifact from SQL Server, so I removed that, too.
I use ILIKE
as explained above.
SELECT DISTINCT ON (thread_id)
thread_id, message_id, parent_message_id, created_at AS initial_response
FROM bi_linear_thread
WHERE left(subject, 5) ILIKE '%RE:%'
AND parent_message_id IS NOT NULL
ORDER BY thread_id, created_at;
Information on DISTINCT ON
:
Select first row in each GROUP BY group?
For perfomance, a trigram GIN index on the expression left(subject, 5)
might help quite a bit. More info:
PostgreSQL LIKE query performance variations