Search code examples
pythondataframescikit-learnsklearn-pandasdbscan

Trajectory clustering using DBSCAN


I'm trying to identify path on trajectories. I have a trajectory with lat,long points.

Here is my code :

def clustersDBSCAN(data):
    from sklearn.cluster import DBSCAN
    a=data
    coords = a['Long']
    coords['Lat'] = a['Lat']
    coords = coords.to_numpy(coords)
    kms_per_radian = 6371.0088
    epsilon = 0.02 / kms_per_radian
    db = DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine').fit(np.radians(coords))
    cluster_labels = db.labels_
    a['clusters']=cluster_labels
    return a

My enter is a DataFrame with some other variables. When I run my procedure, it makes me the following error :

Traceback (most recent call last):

  File "<ipython-input-160-1bb326319131>", line 19, in <module>
    TestEtude1 = clustersDBSCAN(TestEtude1)

  File "<ipython-input-160-1bb326319131>", line 14, in clustersDBSCAN
    db = DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine').fit(np.radians(coords))

TypeError: loop of ufunc does not support argument 0 of type float which has no callable radians method

EDIT :

My data looks like this :

Lat Long    Type de point
136701  53.87030526540526   7.305133353275677       1
136702  53.870307858385225  7.305140443133933       0
136703  53.87031363700621   7.305150308822018       0
136704  53.87031595061333   7.305142298625614       0
136705  53.87032064860515   7.305141557055512       0
136706  53.870326088345934  7.305156457965349       2
136707  53.87030945094248   7.305160487693352       1
136708  53.870349819652134  7.305194852863318       0
136709  53.870340745293994  7.305186559915658       0
136710  53.8702835623423    7.305181727204434       0

The type of point 1 referring to the origine of the trajectory, and the type of point 2 is referring to the end of trajectory. Between 1 and 2, there are the 0 type of point points, which are my intermediate sorted by time points.


Solution

  • The features of the data include the latitude and longitude. Since it is a pandas dataframe you can slice to the features you want to use for performing the clustering in this case.

    Looking at the code, it can be seen that the features being passed are incorrect, you can do the following:

    Replace np.radians(coords) with np.radians(data[["Lat","Long"]]) in the fit() and it should work.