Search code examples
pandasnumpydatek-meansanomaly-detection

ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing


I'm trying to plot with kmeans, but I'm stuck because one column is with dates, and it's making a lot of problems. (you can see in the screenshot the data enter image description here)

I've alreay used to_datetime, so what now I should do?

how to pass this problem and plot it?

Thank you in advance!

from sklearn.cluster import KMeans


AAPL= pd.read_csv('AAPL.csv', header=0, squeeze=True)
#sd=store_data.head(100)

x = pd.to_datetime(AAPL.iloc[:, [0,1]],dayfirst=True)
print(x)
kmeans4 = KMeans(n_clusters=4)
y_kmeans4 = kmeans4.fit_predict(x)
print(y_kmeans4)
print(kmeans4.cluster_centers_)

plt.scatter(x[:,0],x[:,1],c=y_kmeans4,cmap='rainbow')
plt.scatter(kmeans4.cluster_centers_[:,0] ,kmeans4.cluster_centers_[:,1],color='black')


Solution

  • You need select first column only:

    x = pd.to_datetime(AAPL.iloc[:, 0],dayfirst=True)
    

    If use:

    x = pd.to_datetime(AAPL.iloc[:, [0,1]],dayfirst=True)
    

    it select first and second column and raise error, because pd.to_datetime working only if passed columns year, month, days like this solution.