Search code examples
pythontensorflowkerasscatter-plotspeech

Plotting TSNE scatter plot for speech spectrograms


I have spectrogram of speech waveforms belonging to 4 classes. I want to plot TSNE scatter plot to visualize the distribution of the speech files amongst the four classes. How can I do it with tsne ?


Solution

  • Say you have your spectrograms data as an array of shape (n_points, n_dims) and your associated labels.

    Here I will generate mines :

    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn.manifold import TSNE
    
    n_points = 50
    n_classes = 4
    n_dims = 150
    
    # Generate data
    labels = np.random.randint(n_classes, size=n_points)
    X = np.random.normal(size=(n_points, n_dims))
    X = (X.transpose() + labels).transpose()
    

    Then you can simply apply TSNE on your data to make it bidimensional and plot it.

    # Do TSNE
    X_embedded = TSNE(n_components=2).fit_transform(X)
    
    # Plot
    names = ['class_1', 'class_2', 'class_3', 'class_4']
    for i in range(n_classes):
        X_label = X_embedded[np.where(labels == i)]
        plt.scatter(X_label[:, 0], X_label[:, 1], label=names[i])
    plt.legend()
    

    TSNE plot of the clusters