Search code examples
pythongeospatialforecastingdbscan

Predicting long term ships positions with python


I have this data set of historical data about ships positions

id : the id of the ship
date : the date when the position was recorded (on a daily basis) 
size: the size of the ship (categorical with 3 categories)
longitude
latitude
zone : binary (the variable to predict) 
destination: The port of destination
heading : a numerical variable indicating the angle of direction of the ship

So a typical row will look like

id    date        size   longitude   latitude   zone   destination   heading
123   20/04/2017  PMX    26.3565     -15.7474   True   NYC           36.7654

Based on some criteria, I could realize, for each ship, the set of different trajectories they made in the past. So I created a new feature that I called trajectory. I created also a velocity variable So my new dataframe looks like this

id    date        size   longitude   latitude   zone   destination   heading  trajectory
123   20/04/2017  PMX    26.3565     -15.7474   True   NYC           36.7654    1
123   21/04/2017  PMX    29.3556     -18.7498   True   NYC           46.7654    1
123   15/05/2017  PMX    36.8760     12.3449    False  CHINA         78.7640    2
...   ........    ..       .....     .....      ....   ....          ......     ..
567  13/04/2017   SFD    17.8687     16.8787    False  Balb          23.3232    3

I have to implement a classification algo for the zone to say whether or not a ship will pass by this during the next 30 days. I've read some papers talking about DBSCAN clustering using some customized distances between trajectories. But this was to predict positions. So I wanted to know if there is any simpler way to solve this question?


Solution

  • 30 days is basically one journey but sometimes two, for typical oceangoing cargo ships.

    The paths used tend to be rather similar, because they are known to be optimal (modulo routing around storms). These routes will be consistent not just for the same ship, but among all ships of roughly similar size.

    So one approach would be to build a library of routes from your historical data, clustering not positions but paths. If the origin and destination are the same or similar, you should check how similar the routes are. "CHINA" is not a sufficiently precise destination, so if that's your real input data, you should discard that column and produce your own destination by checking which port is near the actual last position on each journey.

    Once they are away from land, cargo ship speeds are somewhat uniform, so the predicted route should be enough to predict position on each day of the journey. And of course once you start, you'll be able to test your predictor on the data you already have.

    The smaller your target zones are, the harder this will be. Hopefully they are quite large.