Search code examples
matlabgroupingdata-analysis

How do I visualize n-dimensional features?


I have two matrices A and B. The size of A is 200*1000 double (here: 1000 represents 1000 different features). Matrix A belongs to group 1, where I use ones(200,1) as the label vector. The size of B is also 200*1000 double (here: 1000 also represents 1000 different features). Matrix B belongs to group 2, where I use -1*ones(200,1) as the label vector.

My question is how do I visualize matrices A and B so that I can clearly distinguish them based on the given groups?


Solution

  • I'm assuming each sample in your matrices A and B is determined by a row in either matrix. If I understand you correctly, you want to draw a series of 1000-dimensional vectors, which is impossible. We can't physically visualize anything beyond three dimensions.

    As such, what I suggest you do is perform a dimensionality reduction to reduce your data so that each input is reduced to either 2 or 3 dimensions. Once you reduce your data, you can plot them normally and assign a different marker to each point, depending on what group they belonged to.

    If you want to achieve this in MATLAB, use Principal Components Analysis, specifically the pca function in MATLAB, that calculates the residuals and the reprojected samples if you were to reproject them onto a lower dimensionality. I'm assuming you have the Statistics Toolbox... if you don't, then sorry this won't work.

    Specifically, given your matrices A and B, you would do this:

    [coeffA, scoreA] = pca(A);
    [coeffB, scoreB] = pca(B);
    numDimensions = 2;
    scoreAred = scoreA(:,1:numDimensions);
    scoreBred = scoreB(:,1:numDimensions);
    

    The second output of pca gives you reprojected values and so you simply have to determine how many dimensions you want by extracting the first N columns, where N is the desired number of dimensions you want.

    I chose 2 for now, and we can see what it looks like in 3 dimensions after. Once we have what we need for 2 dimensions, it's just a matter of plotting:

    plot(scoreAred(:,1), scoreAred(:,2), 'rx', scoreBred(:,1), scoreBred(:,2), 'bo');
    

    This will produce a plot where the samples from matrix A are with red crosses while the samples from matrix B are with blue circles.

    Here's a sample run given completely random data:

    rng(123); %// Set seed for reproducibility
    A = rand(200,1000); B = rand(200,1000); %// Generate random data
    
    %// Code as before
    [coeffA, scoreA] = pca(A);
    [coeffB, scoreB] = pca(B);
    numDimensions = 2;
    scoreAred = scoreA(:,1:numDimensions);
    scoreBred = scoreB(:,1:numDimensions);
    
    %// Plot the data
    plot(scoreAred(:,1), scoreAred(:,2), 'rx', scoreBred(:,1), scoreBred(:,2), 'bo');
    

    We get this:

    enter image description here

    If you want three dimensions, simply change numDimensions = 3, then change the plot code to use plot3:

    plot3(scoreAred(:,1), scoreAred(:,2), scoreAred(:,3), 'rx', scoreBred(:,1), scoreBred(:,2), scoreBred(:,3), 'bo');
    grid;
    

    With those changes, this is what we get:

    enter image description here