Search code examples
matlabclassification

Gscatter for 3 variables


I am working on Discriminant Analysis and would like to classify some data using MATLAB. In the example of Fisher's Iris Data as given in MATLAB (visit www.mathworks.com/products/demos/statistics/classdemo.html for details), they consider only the first 2 variables (Sepal Length & Width). I would like to proceed with classification with more features such as Petal Length and Petal Width.

Also, the MATLAB function gscatter seems to take only 2 variables.

gscatter(meas(:,1), meas(:,2), species,'rgb','osd');

I would like to include meas(:,3) as well and proceed. Kindly help me. Thank You


Solution

  • You can't do that with gscatter because it only plots 2D data. If you want to do this for 3D data (as you are including another dimension), consider using plot3 to plot the data in 3D. However, gscatter allows you to specify the colour for each group as well as the markers for each point. We can still do the same with plot3 but it'll require a bit more work. Spawn a new blank figure, use hold on, then use a loop to plot all the data belonging to a particular category one at a time with a different colour and marker.

    What we should do first is take the categorical data in species and assign to each category a unique ID so that we can separate out the correct data to plot. As such, try doing this:

    load fisheriris; %// The data that you are referring to
    [~,~,id] = unique(species);
    colors = 'rgb';
    markers = 'osd';
    
    for idx = 1 : 3
        data = meas(id == idx,:);
        plot3(data(:,1), data(:,2), data(:,3), [colors(idx) markers(idx)]);
        hold on;
    end
    grid; %// Show a grid
    

    Let's go through the code slowly. load fisheriris loads in the Fisher Iris data like you have mentioned. The next line uses unique to go through the species array and assigns a unique ID to each string. We will need this array to separate out the data stored in meas for each category so we can plot these on our graph with the different colour and marker. I declare two string arrays that will store the colours for each marker as well as the markers. Now, we create a loop that will separate out the data for each unique label, plot this data using plot3, then mark each point with the corresponding colour and marker. You need to use hold on so that we can plot more than one set of points in the same graph. Not doing this will clear the graph every time you call plot3 and plot only the most recent points added. As a bonus, we add a grid so we can see the graph better.


    This is what we get:

    enter image description here