Search code examples
pythonmachine-learningclassificationgeospatialfeature-extraction

Feature Extraction for Geospatial Vector Data


The problem I am exploring right now is a binary classification problem about classifying road intersections into roundabouts or not roundabouts. The available input data consists of the GPS latitude / longitude points contained inside the intersection polygons. So each sample contains a list of GPS points that we know that are contained in the intersection.

As such, I am interested in Machine Learning / Deep Learning techniques for classifying geospatial vector data specifically (as opposed to raster data). I've searched the web quite a bit and it seems to me that most of the ML research on geospatial data focuses on raster data. The only paper researching learning techniques applied on geospatial vector data I found is this: https://arxiv.org/abs/1806.03857, which refers to Polygon data, not Points. I was considering taking the (projected and scaled) point coordinates as features, but since each intersection contains a different number of points, the feature vectors will have variable-length.

How do I go about feature engineering / extraction in this case? I suspect that simply taking the point coordinates and zero-padding until the feature vectors have a fixed length, isn't going to work, due to the dimensionality curse, especially given that I only have ~800 intersection samples.


Solution

  • I have never worked with this kind of geospatial data before, but here is an idea. You could plot the gps coordinates for each data point and transform it into a 2d image . Then either train a very simple CNN or just use any pre-trained CNN model with an extra layer. This could work if there is any information in the picture that you created.