Search code examples
pythonscikit-learnk-means

Scikit Value Error: Expected 2D array, got 1D array instead


I am trying my hand at scikit-learn. I have a very simple dataset of timestamps and gas concentrations in the form of ppm.

Error:

ValueError: Expected 2D array, got 1D array instead:
array=[396.4 394.  395.8 395.3 404.2 400.6 397.7 401.5 394.7 398.9 402.5 394.6
 401.2 401.  399.  398.5 401.3 401.7 406.5 395.9 401.2 399.8 398.2 401.9
 405.4 396.1 402.8 404.4 402.5 400.9 402.8 397.8 399.7 398.4 403.4 401.4
 393.1].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

code:

import numpy as np
import pandas as pd
from sklearn.cluster import KMeans

data = pd.read_csv(r"myfilepath.csv")
print(data.shape)

kmeans = KMeans(n_clusters = 2, random_state = 0)
X = data['reading']
kmeans.fit(X)
#clusters = kmeans.fit_predict(data)
print(kmeans.cluster_centers_.shape)

Solution

  • I did some more digging and discovered that converting my dataframe to a numpy array and then using python negative indexing fixed my problem

    updated code:

    import numpy as np
    import pandas as pd
    from sklearn.cluster import KMeans
    
    # CHANGES
    data = pd.read_csv(r"myfilepath.csv").to_numpy()
    
    print(data.shape)
    
    kmeans = KMeans(n_clusters = 2, random_state = 0)
    
    #CHANGES
    X = data[:-1]
    
    kmeans.fit(X)
    #clusters = kmeans.fit_predict(data)
    print(kmeans.cluster_centers_.shape)
    
    plt.scatter(X[:, 0], X[:, 1], s=50, cmap='viridis')
    centers = kmeans.cluster_centers_
    plt.scatter(centers[:, 0], centers[:, 1], s=200, alpha=0.5)