Search code examples
sqlpostgresqlpostgisaggregation

How to aggregate only neighbour records in sql?


Let us suppose I have a table called locations, it contains records for a source's position and when this position was recorded.

id source_id ts position
1 3134 2021-02-21 20:48:08.488359 ...
2 3134 2021-02-21 20:48:09.123422 ...
... ... ... ...

ts is short for timestamp and the position is in postgis format but it doesn't really matter for this question.

And I have another table called zones:

id area
1 ...
2 ...
... ...

The area is a polygon in postgis format.

I can test if a point is in an area for example

SELECT locations.source_id, zones.id, locations.ts
from locations inner join
     zones
     on ST_Contains(zones.area, locations.position);

will tell me which source is in which zone when.

What I would like is a query which produces the following output:

locations.source_id zones.id in_zone_time
3134 1 1 mins
3134 2 4 mins
3134 1 2 mins
... ... ...

So it will tell me that the source 3134 was in zone 1 for 1 minute, after that it was in zone 2 for 4 minutes and then it was in zone 1 again for 2 minutes.

How do I achieve aggregation in this way?


Solution

  • You can add a flag and then treat this as a gaps-and-islands problem. It is not clear how time is calculated, but the idea is:

    select source_id, id, min(ts), max(ts), max(ts) - min(ts) as diff
    from (select l.source_id, z.id, l.ts,
                 row_number() over (partition by l.source_id order by l.ts) as seqnumm
                 row_number() over (partition by l.source_id, z.id order by l.ts) as seqnum_2
          from locations l inner join
               zones
               on ST_Contains(z.area, l.position)
         ) lz
    group by (seqnum - seqnum_2);
    

    This makes a few assumptions:

    • All points are in a zone. Or at least, you only care about the points within a zone.
    • Poinsts are only in one zone at a time.
    • The time spent in a zone is the difference between the earliest and last records of being in the zone.

    All of these are a quite reasonable interpretation of your question, but can be relaxed. If your requirements are different, I would suggest that you ask a new question, being explicit about the details of the problem. Sample data and corresponding desired results are very helpful.