I have this data set of historical data about ships positions
id : the id of the ship
date : the date when the position was recorded (on a daily basis)
size: the size of the ship (categorical with 3 categories)
longitude
latitude
zone : binary (the variable to predict)
destination: The port of destination
heading : a numerical variable indicating the angle of direction of the ship
So a typical row will look like
id date size longitude latitude zone destination heading
123 20/04/2017 PMX 26.3565 -15.7474 True NYC 36.7654
Based on some criteria, I could realize, for each ship, the set of different trajectories they made in the past. So I created a new feature that I called trajectory. I created also a velocity variable So my new dataframe looks like this
id date size longitude latitude zone destination heading trajectory
123 20/04/2017 PMX 26.3565 -15.7474 True NYC 36.7654 1
123 21/04/2017 PMX 29.3556 -18.7498 True NYC 46.7654 1
123 15/05/2017 PMX 36.8760 12.3449 False CHINA 78.7640 2
... ........ .. ..... ..... .... .... ...... ..
567 13/04/2017 SFD 17.8687 16.8787 False Balb 23.3232 3
I have to implement a classification algo for the zone to say whether or not a ship will pass by this during the next 30 days. I've read some papers talking about DBSCAN clustering using some customized distances between trajectories. But this was to predict positions. So I wanted to know if there is any simpler way to solve this question?
30 days is basically one journey but sometimes two, for typical oceangoing cargo ships.
The paths used tend to be rather similar, because they are known to be optimal (modulo routing around storms). These routes will be consistent not just for the same ship, but among all ships of roughly similar size.
So one approach would be to build a library of routes from your historical data, clustering not positions but paths. If the origin and destination are the same or similar, you should check how similar the routes are. "CHINA" is not a sufficiently precise destination, so if that's your real input data, you should discard that column and produce your own destination by checking which port is near the actual last position on each journey.
Once they are away from land, cargo ship speeds are somewhat uniform, so the predicted route should be enough to predict position on each day of the journey. And of course once you start, you'll be able to test your predictor on the data you already have.
The smaller your target zones are, the harder this will be. Hopefully they are quite large.