Search code examples
pythonkerasconv-neural-networkcluster-analysisautoencoder

How to get the clustering data (y_true, y_pred) from my code, Keras, python


I'm using a CNN with an Autoencoder to cluster different types of RNA. The clusters are calculated from the compressed representations of the different RNAs. Every RNA has a label corresponding to the type of RNA. In my case 7 different classes. After I get the result of the clustering I would like to visualize the results and see which RNA clusters where but right now the y_pred value does not correspond to the to the RNA-class but to the cluster that was initialized by kmeans.


kmeans = KMeans(n_clusters=self.n_clusters, n_init=20)
self.y_pred = kmeans.fit_predict(self.encoder.predict(x))
y_pred_last = np.copy(self.y_pred)
        self.model.get_layer(name='clustering').set_weights([kmeans.cluster_centers_])
print(kmeans.labels_)

self.y_pred = q.argmax(1)
if y is not None:
    acc = np.round(metrics.acc(y, self.y_pred), 5)
    nmi = np.round(metrics.nmi(y, self.y_pred), 5)
    ari = np.round(metrics.ari(y, self.y_pred), 5)
    loss = np.round(loss, 5)
    logdict = dict(iter=ite, acc=acc, nmi=nmi, ari=ari, L=loss[0], Lc=loss[1], Lr=loss[2])

optimizer = 'adam'
dcec.compile(loss=['kld', 'mse'], loss_weights=[args.gamma, 1], optimizer=optimizer)
dcec.fit(x, y=y, tol=args.tol, maxiter=args.maxiter,
         update_interval=args.update_interval,
         save_dir=args.save_dir,
         cae_weights=args.cae_weights)
y_pred = dcec.y_pred


result = list(itertools.chain(y))

with open('datapoints.csv', mode='w', newline='') as data_points:
    data_writer = csv.writer(data_points)
    data_writer.writerow(['id', 'ytrue', 'ypred'])
    truth= y
    prediction = dcec.y_pred
    for i in range(len(result)):
         data_writer.writerow([i, truth[i], prediction[i]])

My problem right now is this part: prediction = dcec.y_pred The output shows me the correct true label but not the "correct" predicted label. It returns a value but this does not correspond to the RNA-types

I don't know if this is the right path. Mainly I just want to visualize the clusters and see which RNA type was rightly and wrongly classified.


Solution

  • You might not be using the correct function call to get the prediction from the Keras model. I believe you should be doing something like:

    prediction = dcec.predict(x)
    

    Additional details are here: https://keras.io/models/model/

    I hope this helps.