Search code examples
matlabscatter-plot

Matlab scatter plotting - How do I define colours and symbols via look-up table equivalent?


I have been using Matlab off and on over the past few years to produce various figures of isotope data, but have hit a brick wall with one particular figure and standardising the output.

The x,y data will always be two different isotopes.

The categories that will always need to be shown are the Species of animal (Symbol) and the Cluster group (Colour) - since I am interested in movement and mobility of these animals and determining the local, non-local, and mixed isotope values.

I want to be able to have set of defined Symbols for each Species - so Cattle = circle, Horse = square, etc.

I also want to have a set of defined Colours for each Cluster group.

I first tried using a loop to parse the data by Species and set the Marker, but when I started to move onto the Cluster group I realised this was a clunky work-around and if say I wanted to set the Colour to the local Lithology then I am potentially rewriting a ton of code.

I downloaded gramm and managed to get close to what I want to do and even managed setting the Lightness by site.

However, since the colours and markers are assigned in order and not tied to a specific Cluster group or Species, then in the cases where two sites don't have all the same Species, the Markers will not be consistent across the figures.

I've attached a figure using gramm that is really close. I am only missing the ability to map specific markers and colours.

Any guidance would be gratefully received and much appreciated.

image


Solution

  • One option would be to define maps (or dictionaries are available in newer versions of MATLAB) for each series attribute you want to use. Then loop over combinations of the corresponding data groups, and plot accordingly.

    For an example, we can generate some data...

    species = {'cattle','horse','sheep'}; % Possible species
    clusters = [3,17; 5.5,14; 7,10];      % Nominal centre of each group's data
    sites = {'GAS','SF','WD'};            % Possible sites
    
    N = 100; % Number of data points
    data = table(); 
    data.species = species( randi(numel(species),N,1) ).'; % Random species from array
    data.Group = randi(size(clusters,1),N,1); % Random group number
    data.Site = sites( randi(numel(sites),N,1) ).'; % Random site from array
    
    % Generate random data which is clustered by the group value, plus a circular noise area
    noise.t = 2*pi*rand(N,1); 
    noise.r = 2*randn(N,1);
    data.x = clusters( data.Group, 1 ) + noise.r.*cos(noise.t);
    data.y = clusters( data.Group, 2 ) + noise.r.*sin(noise.t);
    

    Pretending we don't know how the data was generated, we can now do some post-processing. To start with, define your standard formatting maps, you could do this in a standalone function which is called for all of your project's plotting to retain consistency:

    % Specify colours for each group number 1, 2, ...
    groupColours = { ...
        1, [1,0,0]
        2, [0.2,1,0.3]
        3, [0,0,1]
        };
    groupColours = containers.Map( groupColours(:,1), groupColours(:,2) );
    % Specify marker style for each species
    speciesMarkers = { ...
        'cattle', 'o'
        'horse', 's'
        'sheep', 'd'
        'other', 'x'
        };
    speciesMarkers = containers.Map( speciesMarkers(:,1), speciesMarkers(:,2) );
    % Specify "shading" for each site, scale 0 (light) to dark (1)
    siteShading = { ...
        'GAS', 0.4
        'SF',  0.7
        'WD',  1.0
        };
    siteShading = containers.Map( siteShading(:,1), siteShading(:,2) );
    

    You might find the function findgroups helpful, which returns the unique values in an array, and the index within those unique values that each row belongs to.

    We can use findgroups on each of the categories you want to style differently. In this case that's the species (marker), group (colour), and site (lightness).

    % Find the unique species, groups, sites, and index of each unique combination
    [idx, nameSpecies, nameGroups, nameSites] = ... 
        findgroups( data.species, data.Group, data.Site );
    

    Then we can create a figure, loop over these groups, and plot the corresponding data with the correct formatting:

    % New figure
    figure(1); clf;
    % Hold on for multiple series
    hold on
    % Loop over all combinations
    for ii = 1:max(idx)
        % Look up marker
        marker = speciesMarkers( nameSpecies{ii} );
        colour = groupColours( nameGroups(ii) );
        shading = siteShading( nameSites{ii} );
        shading = min( colour + (1-shading), 1 );
        % This subset of data
        subset = data( ii==idx, : );
        % Update the colour according to the shading
        % Use the DisplayName option to automatically generate the
        % legend
        plot( subset.x, subset.y, ...
          'Color', shading, 'MarkerFaceColor', shading, ...
          'Marker', marker, 'LineStyle', 'none', ...
          'DisplayName', sprintf('%s | %d | %s', nameSpecies{iSpecies}, nameGroups(iGroup), nameSites{iSite} ) );
    end
    ylim([7,20]);
    grid on;
    legend( 'show', 'Location', 'eastoutside' );
    

    plot

    The slight pain here is the legend, because the legend will show every combination of species, group, and site, which could grow very quickly. To achieve a legend more similar to the one in your example we can hide all of the "real" series by turning off 'HandleVisibility' and make some spoof series which aren't shown in the plot but are shown in the legend.

    So the plot command becomes:

        plot( subset.x, subset.y, ...
            'Color', shading, 'MarkerFaceColor', shading, ...
            'Marker', marker, 'LineStyle', 'none', ...
            'HandleVisibility', 'off' );
    

    And you need a new block to generate the legend entries:

    % Spoof some data for the legend
    spoofInputs = {NaN,NaN,'k','linestyle','none'};
    plot( spoofInputs{:}, 'marker', 'none', 'DisplayName', 'Species:' );
    for spec = unique(data.species.')
        marker = speciesMarkers( spec{1} );
        plot( spoofInputs{:}, 'marker', marker, ...
            'DisplayName', spec{1} );
    end
    plot( spoofInputs{:}, 'marker', 'none', 'DisplayName', 'Groups:' );
    for grp = unique(data.Group.')
        colour = groupColours( grp );
        plot( spoofInputs{:}, 'marker', 'o', ...
            'Color', colour, 'MarkerFaceColor', colour, ...
            'DisplayName', num2str(grp) );
    end
    plot( spoofInputs{:}, 'marker', 'none', 'DisplayName', 'Sites:' );
    for site = unique(data.Site.')
        shading = siteShading( site{1} );
        shading = [0,0,0] + (1-shading);
        plot( spoofInputs{:}, 'marker', 'o', ...
            'Color', shading, 'MarkerFaceColor', shading, ...
            'DisplayName', site{1} );
    end
    

    Which has a nicer end result:

    plot updated

    You can see that the main loop is pretty flexible regardless how many different formatting criteria you have thanks to findgroups, but you may find that making a "spoofed" series set for the legend into a small helper function is useful so it can generate arbitrary groups of series formatting.