Search code examples
c#sql-serversql-server-2008computational-geometrygeography

How to "average" two or more geography linestrings using C#/SQL Server 2008 spatial


Suppose I've got a set of results from a study into the behaviour of a particular migratory bird. The bird has been tagged, and a GPS receiver records the migration path it follows each year over a five year period. The results are stored in a SQL Server table that contains one geography linestring for each year's path.

How would you go about defining the linestring representing the "average" path followed over the five year period?

Note that each sample linestring may contain a different number of points. They also don't start and end at exactly the same points.

The best approach I've got so far is to use interpolation to determine the points at certain set proportions along each linestring. So, for example, the start point, a quarter of the way along, halfway along each route etc. Then calculate the mean average lat/long of those positions across all routes, and construct a new geography linestring from those averaged points.

I've looked in a few computational geometry books to see if there's a better known algorithm or technique to do this, but there doesn't seem to be anything relevant. I can't believe that it isn't something that somebody else hasn't done before though...

I don't need exact code - just suggestions for any better general approaches. I don't need "super-accuracy" either. As a sidenote, I'd ideally like the approach to be applicable to two or more polygons too.

Thanks for any suggestions!


Solution

  • I can't really post any sample code as I am working from my iPhone right now, but I do have a suggestion (don't know if it's good or bad) ...

    For each line, determine each vertex's position (percentage) along the line.

    After getting those values, per line, compute new vertices along each line using all of the OTHER lines' percentage values.

    At this point, each line should contain the same number of vertices and the Nth vertex of each line corresponds directly with the Nth vertex of every other line.

    Now just average vertex 0 for every line to get vertex 0 of the "averaged" line. Repeat for vertex1 of each line, etc.

    This should work for lines as well as polygons.

    Note that you could also employ a weighted averaging algorithm if you could determine an accuracy value for each line. In the past I have used this approach when trying to average two lines. We had the ability to allow each line to be weighted, typically 50:50, but could go all the way to 100:0 or 0:100, depending on the accuracy if the sources.

    I went back and reread your question and saw that you already talked about interpolation. The way you talked about doing it seems like it could smooth or generalize the lines before computing the averages of the interpolated points (the fixed interval points). With my approach you would be densifying each line first and then computing the averages. It seems like that might be more accurate, but maybe not.