I am trying to plot a set of paired data in a correlation graph. It's a study with two field treatments and 8 levels of inputs to judge a plant response. I want to display the data showing 8 different colors representing the 8 input levels and 3 different shapes to represent 3 different years of the study. I am using gscatter.
The problem is that when I specify two variables to group by, it doesn't know that colors go with one variable and shapes with the other, and that I only want color to change when the input changes, and only shape to change when the year changes. The end result is that for each unique grouping (24 of them), it advances through both the colors AND shapes at the same time with each unique combination.
Here are two graphs that illustrate the output. This first graph shows all the data just grouped and colored by inputs only. It correctly assigns a unique color to each input level. There are 2 points for each of 3 years, for a total of 6 points per color.
Now I just want to change the shape of the 2nd and 3rd years and keep the colors the same, but is what I get. You can see in the legend how it cycles through the colors 3 times, and the shapes 8 times - at the same time. So different colors get assigned to the same input level. I tried sorting the data in different ways, but I get exactly the same results.
I also tried manually adjusting the colors for each data point, but the order in the object must be different than listed in the legend, because I get odd results. Some points are changed correctly, but others aren't.
There must be a better way to do this. I'm open to all suggestions either getting this method to work, or using a different function.
Here is the code with a smaller sub-set of the data:
clear
Treatments = table([{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'};{'T1'};{'T2'}]);
Data = [2016 2016 2017 2017 2018 2018 2016 2016 2017 2017 2018 2018 2016 2016 2017 2017 2018 2018;...
0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2;...
4704.5 4059.5 10891 11440.5 4083.5 2876 11459.66667 11752 11566 12036 11323.5 11118.5 10296.5 10234 13074.5 14166 9062 9669]';
% split by treatment
t1Response = Data(strcmp(Treatments.Var1,'T1'),3);
t2Response = Data(strcmp(Treatments.Var1,'T2'),3);
Inputs = Data(strcmp(Treatments.Var1,'T2'),2); % treatment doesn't matter, just need one set
Years = Data(strcmp(Treatments.Var1,'T2'),1); % treatment doesn't matter
% all points one shape, group colors just by inputs
figure;
colors = lines(8);
colors(8,1)=0.5;
g = gscatter(t1Response,t2Response,Inputs,colors([7,2,3],:),'.',20,'on');
% group by input and year
figure;
g2 = gscatter(t1Response,t2Response,{Inputs,Years},colors([7,2,3],:),'.s^',20,'on'); g2(1).MarkerFaceColor = colors(7,:);
% g2(1).MarkerFaceColor = colors(7,:);
% g2(2).MarkerFaceColor = colors(7,:);
% g2(3).MarkerFaceColor = colors(7,:);
% g2(4).MarkerFaceColor = colors(2,:);
% g2(5).MarkerFaceColor = colors(2,:);
% g2(6).MarkerFaceColor = colors(2,:);
% g2(7).MarkerFaceColor = colors(3,:);
% g2(8).MarkerFaceColor = colors(3,:);
% g2(9).MarkerFaceColor = colors(3,:);
You may have to ditch gscatter
and do this yourself. From the documentation item on multiple grouping variables:
Alternatively, g can be a cell array containing several grouping variables (such as {g1 g2 g3}), in which case observations are in the same group if they have common values of all grouping variables.
i.e. it can't handle independent variables with independent colours and shapes.
You can use fingroups
and some indexing to build up groups and looping over the years for plotting. Here is a robust solution which handles cases like having more groups than colours/markers:
% Define colours and markers
colors = lines(8); colors(8,1)=0.5;
markers = {'x','d','o','+','*','s','p''h'};
% Create colour matrix
[gInputs, uInputs] = findgroups(Inputs);
if max(gInputs) > size(colors,1)
warning( 'More inputs than possible colors, colors will be re-used' );
end
colors = colors(mod(gInputs-1,size(colors,1))+1,:); % mod to handle out of range case
% Create marker array
[gYears, uYears] = findgroups(Years);
if max(gYears) > numel(markers)
warning( 'More years than possible markers, markers will be re-used' );
end
markers = markers(mod(gYears-1,numel(markers))+1); % mod to handle out of range case
figure(); hold on
for iYr = 1:max(gYears)
idx = (iYr == gYears);
scatter(t1Response(idx), t2Response(idx), 20, colors(idx,:), 'Marker', markers{iYr}, 'displayname', num2str(uYears(iYr)), 'LineWidth', 2 );
end
hold off
legend('show')
If you want the legend to reflect the combinations of year and input then you will have to use a double loop
% Setup as above...
figure(); hold on
for iYr = 1:max(gYears)
for iIn = 1:max(gInputs)
idx = (iYr == gYears) & (iIn == gInputs);
scatter(t1Response(idx), t2Response(idx), 20, colors(idx,:), 'Marker', markers{iYr}, 'displayname', sprintf('%d: %d',uYears(iYr),uInputs(iIn)), 'LineWidth', 2 );
end
end
hold off
legend('show')
You could probably make the last example quicker using line
instead of scatter
but the syntax is slightly different so I've left it to avoid complication.
To get a more concise legend, you would have to spoof some lines with the formats you want. I've restructured the code a bit to show how this can be done:
% Define colours and markers
colors = lines(8); colors(8,1)=0.5;
markers = {'x','d','o','+','*','s','p''h'};
% Create colour matrix
[gInputs, uInputs] = findgroups(Inputs);
if max(gInputs) > size(colors,1)
warning( 'More inputs than possible colors, colors will be re-used' );
end
% Create marker array
[gYears, uYears] = findgroups(Years);
if max(gYears) > numel(markers)
warning( 'More years than possible markers, markers will be re-used' );
end
% handle index out of range
markers = markers(mod((1:max(gYears))-1,numel(markers))+1);
colors = colors(mod((1:max(gInputs))-1,size(colors,1))+1,:);
% Setup as above...
lineProps = {'markersize', 5, 'linestyle', 'none', 'LineWidth', 2};
figure(); hold on
for iYr = 1:max(gYears)
for iIn = 1:max(gInputs)
idx = (iYr == gYears) & (iIn == gInputs);
line(t1Response(idx), t2Response(idx), 20, 'color', colors(iIn,:), 'Marker', markers{iYr}, 'handlevisibility', 'off', lineProps{:} );
end
end
% Spoof markers for the legend
for iYr = 1:max(gYears)
line( NaN, NaN, 'color', 'k', 'Marker', markers{iYr}, 'displayname', num2str(uYears(iYr)), lineProps{:} );
end
for iIn = 1:max(gInputs)
line( NaN, NaN, 'color', colors(iIn,:), 'Marker', 'o', 'markersize', 20, 'displayname', num2str(uInputs(iIn)), lineProps{:} );
end
hold off
legend('show')