Search code examples
postgresqlpostgis

PostGIS: Best way to find users that crossed paths over last n days?


I have a table where I store users location data when they are running, cycling, walking etc. The table is designed as follows:

CREATE TABLE public.user_location (
    app_user_id int4 NOT NULL,
    location_uuid uuid NOT NULL,
    .....
    location_timestamp timestamptz NOT NULL,
    app_user_location geometry(POINT, 4326) NOT NULL,
    coordinate_accuracy float8 NULL,
);

The location_timestamp field contains the time location was recorded and app_user_location field contains the GPS coordinates. Location data is received every 100 meters if the user is moving (e.g. cycling/jogging etc.). Over days, weeks and months I have collected millions of location points. What I would like to do is to find out which users have crossed paths in last n days (or on any given day) i.e. they were near each other at the same point in time. A simple example would be if a user was jogging for 5 Kms. I need to find other users that he 'met' while he was jogging.

I have created an index on app_user_location column. Next, I would need to do a nearest neighbour for the set of points for the user on the given day (or at a given point in time). I can do it in a loop, but was wondering if there is better SQL way to do it?

Thanks in advance.


Solution

  • Here is a first go at something, I haven't tested it!

    Unless all your users live at the equator, I would recommend using geography instead of geometry for user location - it will simplify the distance calculations. If you are using a version of postgres the supports generated columns, you can do

    ALTER TABLE user_location ADD COLUMN app_user_location_geog GEOGRAPHY(POINT, 4326)
        GENERATED ALWAYS AS (app_user_location) STORED;
    

    Then you can find the ids based on matching the timestamp and ST_DWithin. This will likely require an index on the geography column, and on the timestamps.

    SELECT
      user1.app_user_id AS user1_id,
      user2.app_user_id AS user2_id,
      user1.app_user_location_geog AS user1_location,
      user2.app_user_location_geog AS user2_location,
      user1.location_timestamp AS crossing_time
    FROM
      user_location user1
      JOIN user_location user2 
        ON user1.app_user_id != user2.app_user_id
        -- geo timestamps occurred within a minute of each other
        AND
          user1.location_timestamp >= user2.location_timestamp - INTERVAL '30 seconds'
        AND
          user2.location_timestamp <= user2.location_timestamp + INTERVAL '30 seconds'
        AND
          -- geographies were within 10 meters of each other
          ST_DWithin(user1.app_user_location_geog, user2.app_user_location_geog, 10)