I have a table where I store users location data when they are running, cycling, walking etc. The table is designed as follows:
CREATE TABLE public.user_location (
app_user_id int4 NOT NULL,
location_uuid uuid NOT NULL,
.....
location_timestamp timestamptz NOT NULL,
app_user_location geometry(POINT, 4326) NOT NULL,
coordinate_accuracy float8 NULL,
);
The location_timestamp
field contains the time location was recorded and app_user_location
field contains the GPS coordinates. Location data is received every 100 meters if the user is moving (e.g. cycling/jogging etc.).
Over days, weeks and months I have collected millions of location points. What I would like to do is to find out which users have crossed paths in last n days (or on any given day) i.e. they were near each other at the same point in time. A simple example would be if a user was jogging for 5 Kms. I need to find other users that he 'met' while he was jogging.
I have created an index on app_user_location
column. Next, I would need to do a nearest neighbour for the set of points for the user on the given day (or at a given point in time). I can do it in a loop, but was wondering if there is better SQL way to do it?
Thanks in advance.
Here is a first go at something, I haven't tested it!
Unless all your users live at the equator, I would recommend using geography instead of geometry for user location - it will simplify the distance calculations. If you are using a version of postgres the supports generated columns, you can do
ALTER TABLE user_location ADD COLUMN app_user_location_geog GEOGRAPHY(POINT, 4326)
GENERATED ALWAYS AS (app_user_location) STORED;
Then you can find the ids based on matching the timestamp and ST_DWithin. This will likely require an index on the geography column, and on the timestamps.
SELECT
user1.app_user_id AS user1_id,
user2.app_user_id AS user2_id,
user1.app_user_location_geog AS user1_location,
user2.app_user_location_geog AS user2_location,
user1.location_timestamp AS crossing_time
FROM
user_location user1
JOIN user_location user2
ON user1.app_user_id != user2.app_user_id
-- geo timestamps occurred within a minute of each other
AND
user1.location_timestamp >= user2.location_timestamp - INTERVAL '30 seconds'
AND
user2.location_timestamp <= user2.location_timestamp + INTERVAL '30 seconds'
AND
-- geographies were within 10 meters of each other
ST_DWithin(user1.app_user_location_geog, user2.app_user_location_geog, 10)