machine-learning visualization pca mnist dimensionality-reduction

How to plot two sets of high dimensional data in one visualization plot for comparison?

I am trying to compare my generated samples (i.e. MNIST digit images) from GAN (Generated Adversarial Network). For my 1st experiment, the GAN training is not successful, so the generated samples are not similar to real MNIST images. For my 2nd experiment, the GAN training is very successful, so the generated samples should be overlapped well with real MNIST samples in a visualized plot.

The above example figure shows what I hope to achieve:

The first figure shows the original real image distribution
The second figure shows that the results of GAN1 don't overlap well with real data
The third figure shows that the results of GAN2 overlap well with the real data.

Could someone provide some guidance what is a good way to plot something like this with Python, and provide some code using the following code snippet as example data/code (taken from here) to plot?

from sklearn.manifold import TSNE
from keras.datasets import mnist
import seaborn as sns
import pandas as pd 
(x_train, y_train), (_ , _) = mnist.load_data()
x_train = x_train[:3000]
y_train = y_train[:3000]
x_mnist = reshape(x_train, [x_train.shape[0], x_train.shape[1]*x_train.shape[2]])
tsne = TSNE(n_components=2, verbose=1, random_state=123)
z = tsne.fit_transform(x_mnist)
df = pd.DataFrame()
df["y"] = y_train
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]

sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
                palette=sns.color_palette("hls", 10),
                data=df).set(title="MNIST data T-SNE projection")

Solution

You can try to use dimensionality reduction methods like PCA, t-SNE, LLE or UMAP to reduce the dimension of your images to 2 and plot the images as you already pointed out.

Here is some example code in python:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
X_real = ... # real images e.g. 1000 images as vectors 
X_gan = ... # generated images from GAN with same shape
X = np.vstack([X_real, X_gan]) # stack matrices vertically
X_pca = PCA(n_components=50).fit_transform(X) # for high-dimensional data it's advisible to reduce the dimension first (e.g. 50) before using t-SNE
X_embedded = TSNE(n_components=2).fit_transform(X_pca)

# plot points with corresponding class and method labels
plt.scatter(...)

Instead of t-SNE you can directly use PCA or one of the other methods mentioned above.