Search code examples
machine-learningvisualizationpcamnistdimensionality-reduction

How to plot two sets of high dimensional data in one visualization plot for comparison?


I am trying to compare my generated samples (i.e. MNIST digit images) from GAN (Generated Adversarial Network). For my 1st experiment, the GAN training is not successful, so the generated samples are not similar to real MNIST images. For my 2nd experiment, the GAN training is very successful, so the generated samples should be overlapped well with real MNIST samples in a visualized plot.

Image

The above example figure shows what I hope to achieve:

  1. The first figure shows the original real image distribution
  2. The second figure shows that the results of GAN1 don't overlap well with real data
  3. The third figure shows that the results of GAN2 overlap well with the real data.

Could someone provide some guidance what is a good way to plot something like this with Python, and provide some code using the following code snippet as example data/code (taken from here) to plot?

from sklearn.manifold import TSNE
from keras.datasets import mnist
import seaborn as sns
import pandas as pd 
(x_train, y_train), (_ , _) = mnist.load_data()
x_train = x_train[:3000]
y_train = y_train[:3000]
x_mnist = reshape(x_train, [x_train.shape[0], x_train.shape[1]*x_train.shape[2]])
tsne = TSNE(n_components=2, verbose=1, random_state=123)
z = tsne.fit_transform(x_mnist)
df = pd.DataFrame()
df["y"] = y_train
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]

sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
                palette=sns.color_palette("hls", 10),
                data=df).set(title="MNIST data T-SNE projection")

Solution

  • You can try to use dimensionality reduction methods like PCA, t-SNE, LLE or UMAP to reduce the dimension of your images to 2 and plot the images as you already pointed out.

    Here is some example code in python:

    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn.decomposition import PCA
    from sklearn.manifold import TSNE
    X_real = ... # real images e.g. 1000 images as vectors 
    X_gan = ... # generated images from GAN with same shape
    X = np.vstack([X_real, X_gan]) # stack matrices vertically
    X_pca = PCA(n_components=50).fit_transform(X) # for high-dimensional data it's advisible to reduce the dimension first (e.g. 50) before using t-SNE
    X_embedded = TSNE(n_components=2).fit_transform(X_pca)
    
    # plot points with corresponding class and method labels
    plt.scatter(...)
    

    Instead of t-SNE you can directly use PCA or one of the other methods mentioned above.