Clustering of 3 dimensions set of points

I got a set of points extracted from a movie showing a ball trajectory. Each point has its coordinates (x,y) corresponding to the ball's position in the image extracted from the movie, and a z coordinate corresponding to the time frame. The final representation below is the sum of all extracted points frame after frame, rendered in the same 3D graph.

And here is a picture of how it looks like without the "time" data : it is the camera standpoint.

Issue is that I captured as well some movement from the player in the background (but since it is a flat image, the coordinates does not give this information that the player is in the background) and even some other balls on another field in the back ! But when you give a look in 3 dimensions of all these points, you can still identify very clearly what is the trajectory of the main ball, and what is the noise (the player's movement, the other ball in a game field behind, etc...)

I would like to be able to cluster these (x, y, timestamp) points to remove the noise. But the tools I've used so far are not giving any results : meanshift, DBSCAN are not working. Self-Organizing Maps or Kohonen network neither. what ever parameters change I bring, the result is not satisfying at all.

Do you have any idea why the way I'm doing it is not giving expected results ? Is there a more appropriate way to cluster 3D data ?

Here is the set of data that I've extracted from the movie:

http://arbalette.hopto.org/images/forums/shoot_prise_8_objets_detectes.json

The picture posted

is the shoot number 2, but you have 10 shoots. each detected objet (the ball, and the noise) are in "objets" object, X, Y are the coordinates, and frameNb is the time (the z axis in my picture)

What I've tried, for example with DBSCAN, is the following:

with open("objets_detectes.json") as json_file:
    objets = json.load(json_file)

for shoot in objets:
    positions_spatiales = []    
    for objet in objets[shoot]["objets"]:
        point = [objets[shoot]["objets"][objet]["x"], objets[shoot]["objets"][objet]["y"], objets[shoot]["objets"][objet]["frameNb"]]
        positions_spatiales.append(point)

    positions_spatiales = np.array(positions_spatiales)
    #DBSCAN
    dbscan = DBSCAN(eps=30, min_samples=10)
    clusters = dbscan.fit_predict(positions_spatiales)
    unique_clusters = np.unique(clusters)

and then, I plot it with matplotlib to see the result for each shoot, and the failure...

    fig = plt.figure()
    plt.title('Tir ' + str(shoot) + ', relation position/temps - analyse du bruit et clustering des points')
    ax = fig.add_subplot(111, projection='3d')
    color_indice = 0
    for cluster in unique_clusters:
        x = []
        y = []
        z = []
        cluster_points = positions_spatiales[clusters == cluster]
        for point in cluster_points:
            x.append(point[0])
            y.append(point[1])
            z.append(point[2])

     ax.scatter(x, y, z, c=[replace_by_unique_color_by_cluster], label='Cluster ' + str(cluster), marker='o')

     ax.set_xlabel('X')
     ax.set_ylabel('Y')
     ax.set_zlabel('temps')
     plt.show(block=True)

You can see here, plotted, 3 different results with the following parameters : we can clearly see the the main ball's trajectory is not recognised (1 color = 1 recognised cluster).

    dbscan = DBSCAN(eps=30, min_samples=10)

with eps = 30:

    dbscan = DBSCAN(eps=70, min_samples=10)

with eps = 70:

    dbscan = DBSCAN(eps=120, min_samples=10)

with eps = 120:

Solution

I have the following answer so far, not great, but it hints that the eps and number of points need to be adjusted to get an optimum:

To read the data and packages:

import json
import pandas as pd
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import time

with open("shoot_prise_8_objets_detectes.json") as json_file:
    objets = json.load(json_file)

shoots = []
positions_spatiales = []
x = []
y = []
area = []
frameNumber = []
for shoot in objets:
    for objet in objets[shoot]["objets"]:
        x.append(objets[shoot]["objets"][objet]["x"])
        y.append(objets[shoot]["objets"][objet]["y"])
        area.append(objets[shoot]["objets"][objet]["area"])
        frameNumber.append(objets[shoot]["objets"][objet]["frameNb"])
        shoots.append(shoot)
        
df = pd.DataFrame(columns = ["shoots", "x", "y", "frame number"])
df["shoots"] = shoots
df["x"] = x
df["y"] = y
df["area"] = area
df["frame number"] = frameNumber

The ML part:

fig = plt.figure()
for shoot in df["shoots"].unique():
    #print("processing shoot "+str(shoot)+"...")
    subDf = df[df["shoots"] == shoot]
    #print(subDf) # printing
    dataForClustering = subDf[["x","y","frame number"]] # get data for clustering
    scaler = StandardScaler() # scale
    scaledDataForClustering = scaler.fit_transform(dataForClustering) # get scaled data
    #print("_____________________________________________") # niceness
    #print(scaledDataForClustering) # print to check
    # adjust as necessary...
    eps = 0.6 # tested between 0.5 to 1, this seems to be the optimal range
    min_samples = 2 # tested between 1 to 10, 2-5 seems optimal
    dbscan = DBSCAN(eps=eps, min_samples=min_samples, metric="euclidean", algorithm = "ball_tree") # play around with distance and algorithms...
    #print("_____________________________________________")
    clusters = dbscan.fit_predict(scaledDataForClustering) # predict clusters
    clusters_shifted = clusters + 1 # dshit for np.bincount
    cluster_counts = np.bincount(clusters_shifted) # get count of clusters
    most_common_cluster = np.argmax(cluster_counts) # find the cluster with the highest count
    clusters[clusters != most_common_cluster - 1] = -1 # assign all but non most frequent cluster as -1
    clusters[clusters == most_common_cluster - 1] = 1 # assign most frequent cluster as 1
    # till the end is for plotting...
    fig.clf()
    ax = fig.add_subplot(111, projection='3d')
    for cluster in np.unique(clusters):
        if cluster == -1:
            label = "noise"
        else:
            label = "ball"
        clusterPoints = subDf[clusters == cluster]
        ax.scatter(clusterPoints['x'], clusterPoints['y'], clusterPoints['frame number'], label=f'Cluster {label}')
    ax.set_xlabel('X')
    ax.set_ylabel('Y')
    ax.set_zlabel('Frame Number')
    ax.set_title('Shoot number: '+str(shoot))
    ax.legend()
    fig.canvas.draw()
    plt.savefig("soccer/shoot"+str(shoot)+".png", dpi = 330)
    time.sleep(0.1)

Short about the approach, I assigned the most common cluster as your signal (non-noise) and everything else as noise. I will improve after lunch.. here are the results so far (for first shoot, all shoots looked similar):

V2.0: after lunch and a coffee, some remarks about my previous answer

The clusters that were generated by DBSCAN were not two, but rather many more, I could see however, that the noise itself was being split into many different clusters, and there was always a cluster that defined the trajectory. Hence, I decided that I needed to identify this cluster and set everything else as noise (-1). This works quite well for some shoots, and for some it is not 100% accurate, however this is to be expected with unsupervised learning on a not so big dataset. Here is one example of this raw clustering with shoot 10:

You can see that the trajectory of the ball is nicely clusters, however the noise is being seperated into many small clusters. Ajusting the parameters so this is mitigated (eps: maximum distance between two samples for one to be considered as in the neighborhood of the other) will lead to a worse identification of the ball trajectory. Here are the same results, with the grooping that I mentioned before, with the most common being the ball, and everything else noise:

]

Finally, here are all the results as a gif:

As you can see, works nice, not for all, but nonetheless acceptable imo

If you have any question about this approach, please let me know!