Search code examples
dimensionality-reduction

Plotting new points in a subspace after dimensionality reduction


I would like to plot points with 100 parameters each with values between 0-99 on a 2 dimensional plot. This should be straightforward with normal methods of dimensionality reduction (PCA/tSNE/UMAP etc) but I need to be able to add subsequent points to the plot without it needing to recalculate and therefore change

I am picturing an algorithm that takes a data-point with it's 100 values and converts it to X,Y coordinates that can then be plotted. Points proximal in the 2D projection are proximal in the original 100D space. Does such an algorithm exist? If not, any alternative approaches?

Thanks


Solution

  • I am not sure I understood the question correctly but with an initial set X, we can fit a PCA to compute the principal components. Then, we can use these principal components to transform new samples.

    from sklearn.decomposition import PCA
    import numpy as np
    import matplotlib.pyplot as plt
    
    n_samples, n_feats = 50, 100
    X = np.random.randint(0, 99, size=n_samples * n_feats).reshape(n_samples, n_feats)
    
    pca = PCA(n_components=2).fit(X)
    X_reduced = pca.transform(X)
    
    plt.scatter(X[:, 0], X[:, 1])
    

    This plots,

    Plotting of X

    Then, when a new sample comes in

    new_sample = np.random.randint(0, 99, size=100).reshape(1, 100)
    new_sample_reduced = pca.transform(new_sample)
    plt.scatter(new_sample_reduced[:, 0], new_sample_reduced[:, 1], color="red")
    

    We can plot it

    Plotting of new sample