Search code examples
pythoncluster-analysis

Clustering dataframe after concatenation of x and y


I have x and y arrays, x consists of three arrays and y consists of three arrays that consist of seven values

x= [np.array([6.03437288]), np.array([6.39850922]), np.array([6.07835145])]
y= [np.array([[-1.06565856, -0.16222044,  7.85850477, -2.62498475, -0.46315498,
        -0.33087472, -0.1394244 ]]), 
    np.array([[-1.41487104e+00,  5.81421750e-03,  7.92917001e+00,
        -3.37987517e+00,  1.14685839e-01, -2.91779263e-01,
         2.51753851e-01]]), 
    np.array([[-1.56496814,  0.2612637 ,  7.60577761, -3.55727614,  0.18844392,
        -0.75112678, -0.48055978]])]

I concatenate x and y into one dataframe

df = pd.DataFrame({'x': x,'y': y})

then I tried to cluster this dataframe by k-medoids

kmedoids = KMedoids(n_clusters=3, random_state=0).fit(df)
cluster_labels = kmedoids.predict(df)

but I faced this error

ValueError: setting an array element with a sequence.

I tried to search for a solution to this problem, haven't found a concrete solution. any suggestions even with modified the code


Solution

  • Given arrays x and y as provided in question:

    import pandas as pd
    from sklearn_extra.cluster import KMedoids
    
    df = pd.DataFrame({'x': x,'y': y})
    

    First concatenate x and y of dataframe into one array per row:

    df2 = df.apply(lambda r: np.append(r.x, r.y), axis = 1)
    

    Then create one X array:

    X = np.array(df2.values.tolist())
    

    that can be passed to clustering method:

    kmedoids = KMedoids(n_clusters=3, random_state=0).fit(X)
    cluster_labels = kmedoids.predict(X)
    

    result of clustering:

    array([2, 0, 1], dtype=int64)