I have spectrogram of speech waveforms belonging to 4 classes. I want to plot TSNE scatter plot to visualize the distribution of the speech files amongst the four classes. How can I do it with tsne ?
Say you have your spectrograms data as an array of shape (n_points, n_dims)
and your associated labels.
Here I will generate mines :
import matplotlib.pyplot as plt
import numpy as np
from sklearn.manifold import TSNE
n_points = 50
n_classes = 4
n_dims = 150
# Generate data
labels = np.random.randint(n_classes, size=n_points)
X = np.random.normal(size=(n_points, n_dims))
X = (X.transpose() + labels).transpose()
Then you can simply apply TSNE on your data to make it bidimensional and plot it.
# Do TSNE
X_embedded = TSNE(n_components=2).fit_transform(X)
# Plot
names = ['class_1', 'class_2', 'class_3', 'class_4']
for i in range(n_classes):
X_label = X_embedded[np.where(labels == i)]
plt.scatter(X_label[:, 0], X_label[:, 1], label=names[i])
plt.legend()