Search code examples
plotjulialegend

Julia scatter plot legend based on colour of point


I am plotting two arrays against each other and basing their point colour on the index in the array using a dictionary. I need to now add a legend but am stuck.

I would value opinion on both a more condensed way of getting this plot from two arrays and a dictionary and also how to add a correct legend.

This is my mwe (my actually x and y has 7255 entries). You will see the legend is incorrect as I am using a loop.

using PyPlot
#dictionary key: index value:category
my_dict=Dict(1 => "male",2 => "female",3=> "male", 4=> "n/a",5 => "male",6 => "female")

#assign colour to index based on category in my_dict
function point_colour(idx, my_dict)
  if my_dict[idx]=="male"
        colour="blue"
    elseif my_dict[idx]=="female"
        colour="red"
    else
        colour="black"
    end
    return colour    
end

 #plot x against y using assigned index colour   
f = figure()
plt.figure(figsize=(14,2.5))
x=[1,2,3,4,5,6]
y=[3,4,5,6,7,8]

for i in 1:length(x)
    col=point_colour(i, my_dict)
    plot(x[i],y[i], "o", color=col, label=my_dict[i])
end

plt.legend()
plt.show()



OUTPUT: enter image description here


Solution

  • It's adding the legend entry multiple times because you are plotting each point individually. You have two options:

    1. Group the points by label (i.e. all the male points in one array) and then do one plot call for each of these grouped lists, with the corresponding label.

    2. If you want to keep iterating through a loop like you are now, modify your plotting loop to check if it has already added the legend entry by doing something like:

    labelled = [] #initialize empty list which we will use to keep track of which things we have labelled
    for i in 1:length(x)
        col=point_colour(i, my_dict)
        label = ""
        if my_dict[i] ∉ labelled #get the "not in" symbol by typing \notin + tab at REPL
            label = my_dict[i] #if we haven't already labelled this class, add it to the legend
            push!(labelled,label) #add to our list so we don't label it again
        end
        plot(x[i],y[i], "o", color=col, label=label)
    end
    

    You want it to only add the legend entry once, so what the code above does is check if you have already added it and if so don't label it, but still give it the right color.