Search code examples
matlabgraphdata-visualizationgroupingmatlab-figure

Group by 2 variables, with unique colors for one and unique shapes for the other


I am trying to plot a set of paired data in a correlation graph. It's a study with two field treatments and 8 levels of inputs to judge a plant response. I want to display the data showing 8 different colors representing the 8 input levels and 3 different shapes to represent 3 different years of the study. I am using gscatter.

The problem is that when I specify two variables to group by, it doesn't know that colors go with one variable and shapes with the other, and that I only want color to change when the input changes, and only shape to change when the year changes. The end result is that for each unique grouping (24 of them), it advances through both the colors AND shapes at the same time with each unique combination.

Here are two graphs that illustrate the output. This first graph shows all the data just grouped and colored by inputs only. It correctly assigns a unique color to each input level. There are 2 points for each of 3 years, for a total of 6 points per color. enter image description here

Now I just want to change the shape of the 2nd and 3rd years and keep the colors the same, but is what I get. You can see in the legend how it cycles through the colors 3 times, and the shapes 8 times - at the same time. So different colors get assigned to the same input level. I tried sorting the data in different ways, but I get exactly the same results. enter image description here

I also tried manually adjusting the colors for each data point, but the order in the object must be different than listed in the legend, because I get odd results. Some points are changed correctly, but others aren't.

There must be a better way to do this. I'm open to all suggestions either getting this method to work, or using a different function.

Here is the code with a smaller sub-set of the data:

clear
Treatments = table([{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'}]);
Data = [2016    2016    2017    2017    2018    2018    2016    2016    2017    2017    2018    2018    2016    2016    2017    2017    2018    2018;...
    0   0   0   0   0   0   1   1   1   1   1   1   2   2   2   2   2   2;...
    4704.5  4059.5  10891   11440.5 4083.5  2876    11459.66667 11752   11566   12036   11323.5 11118.5 10296.5 10234   13074.5 14166   9062    9669]';

% split by treatment
t1Response = Data(strcmp(Treatments.Var1,'T1'),3);
t2Response = Data(strcmp(Treatments.Var1,'T2'),3);
Inputs = Data(strcmp(Treatments.Var1,'T2'),2); % treatment doesn't matter, just need one set
Years = Data(strcmp(Treatments.Var1,'T2'),1); % treatment doesn't matter

% all points one shape, group colors just by inputs
figure;
colors = lines(8);
colors(8,1)=0.5;
g = gscatter(t1Response,t2Response,Inputs,colors([7,2,3],:),'.',20,'on');

% group by input and year
figure;
g2 = gscatter(t1Response,t2Response,{Inputs,Years},colors([7,2,3],:),'.s^',20,'on'); g2(1).MarkerFaceColor = colors(7,:);
% g2(1).MarkerFaceColor = colors(7,:);
% g2(2).MarkerFaceColor = colors(7,:);
% g2(3).MarkerFaceColor = colors(7,:);
% g2(4).MarkerFaceColor = colors(2,:);
% g2(5).MarkerFaceColor = colors(2,:);
% g2(6).MarkerFaceColor = colors(2,:);
% g2(7).MarkerFaceColor = colors(3,:);
% g2(8).MarkerFaceColor = colors(3,:);
% g2(9).MarkerFaceColor = colors(3,:);


Solution

  • You may have to ditch gscatter and do this yourself. From the documentation item on multiple grouping variables:

    Alternatively, g can be a cell array containing several grouping variables (such as {g1 g2 g3}), in which case observations are in the same group if they have common values of all grouping variables.

    i.e. it can't handle independent variables with independent colours and shapes.

    You can use fingroups and some indexing to build up groups and looping over the years for plotting. Here is a robust solution which handles cases like having more groups than colours/markers:

    % Define colours and markers
    colors = lines(8); colors(8,1)=0.5;
    markers = {'x','d','o','+','*','s','p''h'};
    
    % Create colour matrix
    [gInputs, uInputs] = findgroups(Inputs);
    if max(gInputs) > size(colors,1)
        warning( 'More inputs than possible colors, colors will be re-used' );
    end
    colors = colors(mod(gInputs-1,size(colors,1))+1,:); % mod to handle out of range case
    % Create marker array
    [gYears, uYears] = findgroups(Years);
    if max(gYears) > numel(markers)
        warning( 'More years than possible markers, markers will be re-used' );
    end
    markers = markers(mod(gYears-1,numel(markers))+1); % mod to handle out of range case
    
    figure(); hold on
    for iYr = 1:max(gYears)
        idx = (iYr == gYears); 
        scatter(t1Response(idx), t2Response(idx), 20, colors(idx,:), 'Marker', markers{iYr}, 'displayname', num2str(uYears(iYr)), 'LineWidth', 2 );
    end
    hold off
    legend('show')
    

    plot1

    If you want the legend to reflect the combinations of year and input then you will have to use a double loop

    % Setup as above...
    figure(); hold on
    for iYr = 1:max(gYears)
        for iIn = 1:max(gInputs)
            idx = (iYr == gYears) & (iIn == gInputs); 
            scatter(t1Response(idx), t2Response(idx), 20, colors(idx,:), 'Marker', markers{iYr}, 'displayname', sprintf('%d: %d',uYears(iYr),uInputs(iIn)), 'LineWidth', 2 );
        end
    end
    hold off
    legend('show')
    

    You could probably make the last example quicker using line instead of scatter but the syntax is slightly different so I've left it to avoid complication.

    plot2

    To get a more concise legend, you would have to spoof some lines with the formats you want. I've restructured the code a bit to show how this can be done:

    plot3

    % Define colours and markers
    colors = lines(8); colors(8,1)=0.5;
    markers = {'x','d','o','+','*','s','p''h'};
    
    % Create colour matrix
    [gInputs, uInputs] = findgroups(Inputs);
    if max(gInputs) > size(colors,1)
        warning( 'More inputs than possible colors, colors will be re-used' );
    end
    % Create marker array
    [gYears, uYears] = findgroups(Years);
    if max(gYears) > numel(markers)
        warning( 'More years than possible markers, markers will be re-used' );
    end
    % handle index out of range
    markers = markers(mod((1:max(gYears))-1,numel(markers))+1);
    colors = colors(mod((1:max(gInputs))-1,size(colors,1))+1,:);
    % Setup as above...
    lineProps = {'markersize', 5, 'linestyle', 'none', 'LineWidth', 2};
    figure(); hold on
    for iYr = 1:max(gYears)
        for iIn = 1:max(gInputs)
            idx = (iYr == gYears) & (iIn == gInputs); 
            line(t1Response(idx), t2Response(idx), 20, 'color', colors(iIn,:), 'Marker', markers{iYr}, 'handlevisibility', 'off', lineProps{:} );
        end
    end
    % Spoof markers for the legend
    for iYr = 1:max(gYears)
        line( NaN, NaN, 'color', 'k', 'Marker', markers{iYr}, 'displayname', num2str(uYears(iYr)), lineProps{:} );
    end
    for iIn = 1:max(gInputs)
        line( NaN, NaN, 'color', colors(iIn,:), 'Marker', 'o', 'markersize', 20, 'displayname', num2str(uInputs(iIn)), lineProps{:} );
    end
    hold off
    legend('show')