Search code examples
pythonmachine-learningscikit-learnreshapemixture-model

What is the correct way to fit a gaussian mixture model to single feature data?


data is a one dimensional array of data.

data = [0.0, 7000.0, 0.0, 7000.0, -400.0, 0.0, 7000.0, -400.0, -7400.0, 7000.0, -400.0, -7000.0, -7000.0, 0.0, 0.0, 0.0, -7000.0, 7000.0, 7000.0, 7000.0, 0.0, -7000.0, 6600.0, -7400.0, -400.0, 6600.0, -400.0, -400.0, 6600.0, 6600.0, 6600.0, 7000.0, 6600.0, -7000.0, 0.0, 0.0, -7000.0, -7400.0, 6600.0, -400.0, 7000.0, -7000.0, -7000.0, 0.0, 0.0, -400.0, -7000.0, -7000.0, 7000.0, 7000.0, 0.0, -7000.0, 0.0, 0.0, 6600.0, 6600.0, 6600.0, -7400.0, -400.0, -2000.0, -7000.0, -400.0, -7400.0, 7000.0, 0.0, -7000.0, -7000.0, 0.0, -400.0, -7400.0, -7400.0, 0.0, 0.0, 0.0, -400.0, -400.0, -400.0, -400.0, 6600.0, 0.0, -400.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -400.0, -400.0, 0.0, 0.0, -400.0, -400.0, 0.0, -400.0, 0.0, -400.0]

I would like to fit some gaussians to this data and plot them.

If I run

import numpy as np
from sklearn import mixture

x = np.array(data)
clf = mixture.GaussianMixture(n_components=2, covariance_type='full')
clf.fit(x)

I get the error

ValueError: Expected n_samples >= n_components but got n_components = 2, n_samples = 1

and

DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.

Ok... I can live with this. The warning tells me what to do. However, if I run

x = np.array(data).reshape(-1,1)
clf = mixture.GaussianMixture(n_components=2, covariance_type='full')
clf.fit(x)

I get the error

ValueError: Expected the input data X have 1 features, but got 32000 features

What am I doing wrong? What is the right way?

Edit:

I just realized that I misread the error message. Not fit() is rainsing the error, but score_samples().

I am trying to plot the gaussians afterwards.

x = np.linspace(-8000,8000,32000)
y = clf.score_samples(x)

plt.plot(x, y)
plt.show()

So x seems to be the problem. However, neither x.reshape(-1,1) helps, nore x.reshape(1,-1).


Solution

  • I found the error myself. As I stated in my edit, not fit() was raising the error, but score_samples().

    Both functions excpect a multi-dimensional array.

    Working code:

    data = np.array(data).reshape(-1,1)
    clf = mixture.GaussianMixture(n_components=1, covariance_type='full')
    clf.fit(data)
    
    x = np.array(np.linspace(-8000,8000,32000)).reshape(-1,1)
    y = clf.score_samples(x)
    
    plt.plot(x, y)
    plt.show()