Search code examples
sqlpostgresqltime-seriestimescaledblateral-join

SQL query for finding latest or max value of timestamp from table corresponding to N ids Timescaledb


I have a table tab1 in timescale db which has 3 columns tag, time, value. time and tag make up the pk for the table: (time, tag).

There are more than 500 lakhs (50 000 000) of rows. I need to find the latest timestamp or the max(time) for each of the N tags.

There are few things that I tried, I'll share my experience with each:

  1. inner query
SELECT "time", "tag", "value"
FROM tab1 
WHERE ("tag","time") IN 
(SELECT "tag", MAX("time") FROM tab1 WHERE "tag" IN(tag1,tag2) GROUP BY "tag" );

this is giving result but taking around 19 sec to execute which is beyond the acceptable limit

  1. using last function from timescale db
SELECT tag, last(time, time), last(value,time)
FROM tab1
WHERE "tag" IN (tag1,tag2) GROUP BY "tag" ;

this is giving output within 10 sec.

I need to find another feasible solution similar to 2nd one that is perhaps performing better. I tried few other things like LATERAL JOIN (3), WINDOW FUNCTIONS (ROW_NUMBER, PARTITION) (4) but the solution is not as expected.

  1. using lateral gives cross of multiple columns and not one single value with max time as expected. Moreover it is taking 15sec to execute but it might be due to wrong query.
SELECT table1."tag", table1."time",table1."value" from tab1 as table1
join lateral (
    SELECT table2 ."tag",table2 ."time" from tab1 as table2   
    where table2."tag" = table1."tag"
    order by table2."time" desc limit 1
) p on true
where table1."tag" in (tag1,tag2)
  1. when trying partition I want to put limit 1 like this: (partition by tag order by time desc limit 1) but it is giving syntax error. without limit 1 I am not getting the latest time.
SELECT * from 
( SELECT *, row_number() over (partition by tag order by time desc) as rownum
from  tab1) a
where tag in (tag1,tag2)

Can anyone suggest what's wrong with the queries in 3,4 or suggest if there is another alternative.

Index for my table is: enter image description here


Solution

  • There are a couple things that are going to help this and make this query easier and more performant. The first, and probably most important one is the index on the table / hypertable - it will need to be a multi-column index on tag, time desc - the order of the time doesn't matter all that much, but the order of columns in the index matters a lot. tag has to be the first column here because we need to first search by tag, then get the latest time, if we have either separate indexes or if we're ordered by time first, this will be highly inefficient.

    You can create this index with a call like this:

    CREATE INDEX ON tab1 (tag, "time" DESC);
    

    The next thing is the formulation of the query. The simplest way to get this for each tag is to write a DISTINCT ON query. In Timescale we've optimized this sort of query. It's a bit of a weird way to formulate it so can be a bit hard to find though.

    Basically you'd write it something like:

    SELECT DISTINCT ON (tag) tag, "time" FROM tab1 ORDER BY tag, "time" DESC;
    

    And that should give you what you want. It's a bit weird, but it will work!

    I'm not going to go through the other approaches, but most of them will be improved dramatically with the index, but this is still likely going to be the most performant approach.

    If you want to, please post a comment with how this worked and if it sped things up for you!