Search code examples
plotjuliaheatmapcategorical-data

Julia: Visualization of a categorical data on a grid


Sometimes it is needed to draw categorical values on a regular grid to show how they cover a certain area. In principle, the plot() function is a good fit for this, but there is a problem that is needed to adjust the size of the icons each time to create the illusion of a solid cover. When changing the coverage of the image, the old size becomes irrelevant and is needed to adjust it again. Is there a technique to adjust this size automatically?

using Plots
using CategoricalArrays
a = [1, 2, 3, 1, 2, 3, 1, 2, 3]
b = [1, 1, 1, 2, 2, 2, 3, 3, 3]
c = CategoricalArray(["X", "X", "Y", "Z", "Y", "Y", "Z", "Y", "Z"])
plot(a, b, group = c, seriestype = :scatter, aspect_ratio = 1, markersize=90, 
markershape=:square, markerstrokewidth=0.0, xlim = (0.5, 3.5), ylim = (0.5, 3.5))

The result is good in everything, except that each time you need to adjust the size of the cells so that there are no overlapping areas or gaps:

enter image description here

As an alternative, I considered heatmap(), but it works quite strangely with categorical data, setting them some kind of scale of its own with a continuous gradation of values. I haven't come across any examples where using heatmap() would get a map with a beautiful legend like plot(), so I'm not sure that using heatmap() is the right way here.

a = b = [1, 2, 3]
c = CategoricalArray(["X" "X" "Y"; "Z" "Y" "Y"; "Z" "Y" "Z"])
heatmap(a, b, c)

enter image description here

Maybe there is still some way to automatically set the size of the cells of plot()?


Solution

  • There are various ways to create such a plot within Plots.jl. Perhaps the most obvious interpretation of what you want is shapes. For that approach, you also need to understand how to group unconnected data within the same groups. A solution based on shapes could look like this:

    a = [1, 2, 3, 1, 2, 3, 1, 2, 3]
    b = [1, 1, 1, 2, 2, 2, 3, 3, 3]
    c = CategoricalArray(["X", "X", "Y", "Z", "Y", "Y", "Z", "Y", "Z"])
    
    groups = Dict(cat => NTuple{2,Int}[] for cat in levels(c))
    for (ca, cb, cat) in zip(a,b,c)
        push!(groups[cat], (ca,cb))
    end
    
    w = 1
    shapes = map(collect(groups)) do (cat, vals)
        cat => mapreduce(vcat, vals) do (ca, cb)
            [ca cb] .+ [-.5 -.5; .5 -.5; .5 .5; -.5 .5; -.5 -.5; NaN NaN]*w
        end
    end
    
    p = plot(aspect_ratio=1)
    for (cat, s) in sort(shapes;by=x->x[1])
        plot!(s[:,1], s[:,2], label=cat, seriestype=:shape, linewidth=0)
    end
    

    enter image description here

    Most of the code is simply moving the data around so we get a Vector of Pairs from the categorical value to a matrix specifying all of the vertices, like this for "X":

    "X" =>
    12×2 Matrix{Float64}:
       0.5    0.5
       1.5    0.5
       1.5    1.5
       0.5    1.5
       0.5    0.5
     NaN    NaN
       1.5    0.5
       2.5    0.5
       2.5    1.5
       1.5    1.5
       1.5    0.5
     NaN    NaN
    

    A perhaps slightly simpler solution would be to "trick" Plots to display what we want using a heatmap, like this:

    a = b = [1, 2, 3]
    c = CategoricalArray(["X" "X" "Y"; "Z" "Y" "Y"; "Z" "Y" "Z"])
    pal = palette(:default)
    p = plot(aspect_ratio=1, size=(400,400))
    heatmap!(a,b,c, c=pal, colorbar=false, clims=(1,length(pal)))
    for cat in sort(collect(Set(c)))
        plot!(
            [], [], seriestype=:shape,
            label=cat, color=pal[levelcode(cat)]
        )
    end
    

    enter image description here