Search code examples
sqlt-sqlpostgresqlaggregate-functionsgreatest-n-per-group

How to convert TSQL aggregate function to Postgresql


I am having difficulty converting a view from T-SQL to Postgresql as it relates to the aggregate functions.

Here is my original SQL query:

SELECT TOP (100) PERCENT 
    thread_id, 
    MIN(message_id) AS message_id,  
    MIN(parent_message_id) AS parent_message_id, 
    MIN(created_at) AS initialResponse
FROM 
    dbo.bi_linear_thread
WHERE 
    LEFT([subject], 5) LIKE '%RE:%' AND parent_message_id IS NOT NULL
GROUP BY 
    thread_id
ORDER BY 
    thread_id

I have attempted to utilize the window function of:

first_value(message_id) OVER (Partition BY message_id ORDER BY messageid)

but continue to get incorrect returns.

Thoughts?

Edit for more context

-- after the answer provided. Hopeful this is helpful to others.

Needed to read the first line in a group that excluded the first record in a hierarchal order of threads to replies.
thread_id represents the thread.
message_id represents any message type whether a reply/original message. parent_message_id represents the original message in the thread.
"RE:" is a specified format for replies regardless whether nested or not in the subject field.


Solution

  • SELECT thread_id
          ,MIN(message_id) AS message_id
          ,MIN(parent_message_id) AS parent_message_id
          ,MIN(created_at) AS initialResponse
    FROM   dbo.bi_linear_thread
    WHERE  left(subject, 5) LIKE '%RE:%'
    AND    parent_message_id IS NOT NULL
    GROUP  BY thread_id
    ORDER  BY thread_id;
    

    Except for removing the illegal brackets from [subject] and also removing TOP (100) PERCENT, which is just noise anyway, the query should work.

    You might want ILIKE instead of LIKE for case-insensitive pattern-matching.

    left() was introduced with Postgres 9.1.

    If you want CaMeL-case identifiers you need to double-quote: "initialResponse". Else they are lower-cased automatically. My advice is to use lower-case identifiers exclusively.

    Educated guess

    If you actually want the row with the earliest created_at per thread_id, which you cunningly kept a secret, your query is incorrect, tSQL or PostgreSQL alike.
    You could use DISTINCT ON for that.

    You probably don't have a schema named dbo in Postgres either, and that's an artifact from SQL Server, so I removed that, too.

    I use ILIKE as explained above.

    SELECT DISTINCT ON (thread_id)
           thread_id, message_id, parent_message_id, created_at AS initial_response
    FROM   bi_linear_thread
    WHERE  left(subject, 5) ILIKE '%RE:%'
    AND    parent_message_id IS NOT NULL
    ORDER  BY thread_id, created_at;
    

    Information on DISTINCT ON:
    Select first row in each GROUP BY group?

    For perfomance, a trigram GIN index on the expression left(subject, 5) might help quite a bit. More info:
    PostgreSQL LIKE query performance variations