Search code examples
pythonpython-3.xscikit-learnk-means

Getting a weird error that says 'Reshape your data either using array.reshape(-1, 1)'


I am testing this code.

# Import the necessary packages
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Normalizer
from sklearn.cluster import KMeans
# Define a normalizer
normalizer = Normalizer()
# Create Kmeans model
kmeans = KMeans(n_clusters = 10,max_iter = 1000)
# Make a pipeline chaining normalizer and kmeans
pipeline = make_pipeline(normalizer,kmeans)
# Fit pipeline to daily stock movements
pipeline.fit(score)
labels = pipeline.predict(score)

This line throws an error:

pipeline.fit(score)

Here is the error that I see:

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I don't know what this error means. I Googled it and didn't find anything useful. Here is a small sample of my data:

array=[1. 1. 1. ... 8. 1. 1.].

I am following the example from the link below.

https://medium.com/datadriveninvestor/stock-market-clustering-with-k-means-clustering-in-python-4bf6bd5bd685

When I run the code from the link, everything works fine. I'm not sure why it falls down when I run the code on my own data, which is just:

1, 1.9, 2.62, 3.5, 4.1, 7.7, 9.75, etc, etc.  

It goes from 1-10. That's all it is.


Solution

  • Any sklearn.Transformer expects a [sample size, n_features] sized array. So there's two scenarios you will have to reshape your data,

    • If you only have a single sample, you need to reshape it to [1, n_features] sized array
    • If you have only a single feature, you need to reshape it to [sample size, 1] sized array

    So you need to do what suits the problem. You are passing a 1D vector.

    [1. 1. 1. ... 8. 1. 1.]
    

    If this is a single sample, reshape it to (1, -1) sized array and you will be fine. But with that said you might want to think about the following.

    • If this is a single sample, there's no point in fitting a model with a single sample. You won't get any benefit.
    • If this is a set of samples with a single feature, I don't really see a benefit in doing K-means on such a dataset.