Search code examples
pythonpandasnumpymathdistance

How to calculate the (smallest) distance between a set of lines and a point?


So basically I am trying to label if some GPS points are very close to a road.

I have vectors of lat-lon coordinates which, when connected to the the subsequent/antecedent row in the vector, form a line which represents a road (depicted as matplotlib plots in colors) I also have simple geographic points (lat-lon) (depicted as matplotlib scatter plot in black) two muppets

i would like to label whether a point is near a road (e.g. within 0.001 radians distance) - for that, I am guessing one would need to calculate the closest distance of a point to this set of vectors.

#example vector 1
[[-84.3146272, 33.7741084], [-84.3145183, 33.7741196]]
#example vector 2
[[-84.4043106, 33.7700542], [-84.4045421, 33.770055]]

#example point to predict wether it will be near one of these two lines
[-84.31106, 33.73887]

How can one tackle this problem? I cannot think of a way to solve this, but when looking at the plot it seems simple... Is there a library that could help?


Solution

  • I am assuming you are working with the spherical model of the Earth. Then I suppose what you call "lines" are in fact arc-segments of great circles (the straight lines of spherical geometry). In other words, you have one point p1 on the surface of the sphere with lat-lon coordinates [lat1, lon1] and another point p2 on the sphere with lat-lon coordinates [lat2, lon2]. What is considered "the straight line" on the sphere that passes through p1 and p2 is the unique circle (called great circle) obtained by the intersection of the sphere with the plane that passes through the center of the sphere and the two points p1 and p2. Then, what you call a "line", is the smaller of the two circular arcs from this great circle, bounded by the two points p1 and p2.

    The distance you would like to calculate (in radians) could be the distance from a third point p with lat-lon coordinates [lat, lon] to the arc-segment determined by the two points p1 and p2. The said distance should be the arc-length of the arc from the great circle passing through p and perpendicular to the great circle of p1 and p2. This perpendicular great circle is the one determined by the intersection of the sphere with the plane perpendicular to the plane of the great circle of p1 and p2 and passing through the point p and the center of the sphere. If the intersection of the perpendicular great circle with the great-circle arc p1 p2 is a point h inside the arc-segment p1 p2, then the length of the great circle arc p h is the sought distance. If, however, h is outside the arc p1 p2 the the sought distance is either p p1 or p p2 whichever is smaller.

    Here is some Matlab code that calculates the shortest distance between a point and an arc-interval:

    lat_lon = [lat, lon];
    lat_lon1 = [lat1, lon1];
    lat_lon2 = [lat2, lon2];
    
    function dist = dist_point_2_road(lat_lon, lat_lon1, lat_lon2)
    
       % you may have to convert lat-long angles from degrees to radians 
    
       % First, convert from lat-long coordinates to 3D coordinates of points on the unit    
       % sphere. Since Earth's radius cancels out in our computations, we simply assume it 
       % is R = 1 
       lat = lat_lon(1);
       lon = lat_lon(2);
    
       lat1 = lat_lon1(1);
       lon1 = lat_lon1(2);
    
       lat2 = lat_lon2(1);
       lon2 = lat_lon2(2);
    
       p1 = [ cosd(lat1)*cosd(lon1),  cosd(lat1)*sind(lon1),  sind(lat1) ]; %cosd = cos(degrees)
       p2 = [ cosd(lat2)*cosd(lon2),  cosd(lat2)*sind(lon2),  sind(lat2) ]; %sind = sin(degrees)
       p = [ cosd(lat)*cosd(lon),  cosd(lat)*sind(lon),  sind(lat) ];
    
       % n12 is the unit vector perpendicular to the plane of the great circle 
       % determined by the points p1 and p2  
       n12 = cross(p1, p2);
       n12 = n12 / sqrt(dot(n12, n12));
       sin_of_dist = dot(p, n12); % sine of the angle that equals arc-distance 
                                  % from point p to the great arc p1 p2  
    
       dist = pi/2 - acos(abs(sin_of_dist)); % acos = arccos, abs() = absolute value
              % dist is the shortest distance in radians from p to the 
              % great circle determined by the points p1 and p2
    
       n1 = cross(p1, p); 
       n1 = n1 / sqrt(dot(n1, n1));
       % unit normal vector perpendicular to the great-arc determined by p and p1
    
       n2 = cross(p, p2);
       n2 = n1 / sqrt(dot(n2, n2));
       % unit normal vector perpendicular to the great-arc determined by p and p2
    
       if dot(n12, n1) < 0 % if the angle of spherical triangle p p1 p2 at vertex p1 is obtuse 
          dist = acos(dot(p, p1)); % the shortest distance is p p1 
       elseif dot(n12, n2) < 0 % if the angle of spherical triangle p p1 p2 at vertex p2 is obtuse 
          dist = acos(dot(p, p2)); % the shortest distance is p p2 
       end
    
       % the function returns the appropriate dist as output 
    
    end
    

    You can iterate this for the sequence of arc-intervals that form a road and select the smallest distance to an arc-interval.

    According to this computation, the distance of the point to the first "vector 1" is 0.0000615970599633145 radians and the distance to the second "vector 2" is 0.00162840939265068 radians. The point is closest to a point inside vector 1, but for vector 2, it is closest to to one of the ends of the arc-interval.

    Edit. Now, if one wants to use Euculdiean (flat) approximation, ignoring Earths curvature, one may needs to convert lat-lon coordinates into Euclidean flat approximation coordinates. To avoid any map specific coordinates, one may be tempted to plot latitude against longitude coordinates. That may be ok around the equator, but the closer to the poles one gets, the more innaccurate these coordinates gets at representing distance data. This is because closer to the poles, distance along a fixed latitude is much shorter than distance along a fixed longitude. That is why we need to correct this discrepancy. This is done by using the Riemannian metric on the sphere in lat-long coordinates, or simply by looking at the 3D geometry of latitude and longitude circles near a given point on the sphere.

    lat_lon = [lat, lon];
    lat_lon1 = [lat1, lon1];
    lat_lon2 = [lat2, lon2];
    
    %center of approximate Euclidean coordinate system is point p 
    % with lat_long coordinates and the scaling coefficient of longitude, 
    % which equalizes longitude and latitude distance at point p, is
    
    a = cosd(lat_long(1));  
    
    function  x = convert_2_Eucl(lat_long1, lat_long, a)
       x = [lat_long1(1) - lat_long(1),  a*(lat_long1(2) - lat_long(2))];
    end
    
    % convert from lat-long to approximate Euclidean coordinates
    x1 = convert_2_Eucl(lat_long1, lat_long, a);
    x2 = convert_2_Eucl(lat_long2, lat_long, a);
    
    function dist = dist_point_2_road(x1, x2)
    
       dist = dot(x1, x1) * dot(x2 - x1, x2 - x1) - dot(x1, x2 - x1)^2 ;
       dist = sqrt( dist / ( dot(x2 - x1, x2 - x1)^2) );
       % dist is the distance from the point p, which has Eucl coordinates [0,0] 
       % to the straight Euclidean interval x1 x2 representing the interval p1 p2
    
       if dot(x1, x2 - x1) > 0
          dist = sqrt( dot(x1, x1) );
       elseif dot(x2, x1 - x2) > 0
          dist = sqrt( dot(x2, x2) );
       end
    
    end
    

    Remark: the latter function calculates distance, but it might be equally convenient to just calculate dist^2, avoiding the calculation of the square root sqrt in order to speed up the performance. Measuring with respect to dist^2 should work just as well.

    You choose which function you want, the spherical one or approximately Euclidean. The latter is probably faster. You can choose to remove the square root and calculate distance squared to make things even faster.

    I wrote this in a hurry, so there might be some inaccuracies.