Search code examples
gnuplot

Scatter plot with 3 categorical variables in gnuplot


With data like this

Sensitivity,Recall,ID,Param1,Param2
0.89,0.551,run1,A1,alpha1
0.93,0.78,run1,A2,alpha2
0.54,0.76,run1,A2,alpha3
0.95,0.99,run2,A1,alpha1
0.354,0.445,run3,A1,alpha1
0.89,0.72,run4,A2,alpha1

I would like to be able to reproduce this Julia plot enter image description here

Separating the plots in ID can be done by preprocessing. However, I cannot figure out how to make a color to a first column and marker shape to a second column and add them to a legend.

My first idea was to concatenate Param1 and Param2 into a single column (like "A1-alpha1"). By separating the data in two, like this :

#A2
Sensitivity,run1-alpha2,run1-alpha3,run4-alpha1
0.93,0.78,,
0.54,,0.76,
0.89,,,0.72


#A1
Sensitivity,run1-alpha1,run2-alpha1,run3-alpha1
0.89,0.551,,
0.95,,0.99,
0.354,,,0.445

The code would be

set datafile separator comma
set terminal pngcairo
set output "test2.png"
set multiplot layout 1,2
set xlabel "A1"
plot for [i=2:4] 'test2.csv' index "A1" u 1:i title columnhead(i) ps 3
set xlabel "A2"
plot for [i=2:4] 'test2.csv' index "A2" u 1:i title columnhead(i) ps 3
unset multiplot

enter image description here Is there a better way (especially for having coherent labels in legend) ? Thanks,


Solution

  • As I understand you have 3 parameters and want to plot the data

    • into a multiplot-subplot depending on a first parameter
    • with a color depending on a second parameter
    • with a pointtype depending on a third parameter

    You could do this in loops with using a filter function, i.e. if a condition is met you return the value of a certain column or otherwise NaN.

    The example below is assuming that your parameters are known beforehand and are listed in some strings. Actually, there are also ways to let gnuplot create the lists of parameters automatically which might be interesting if your parameters vary from datafile to datafile.

    For further information check help variable, help word, help words, help key, help keyentry. The example below can certainly be further tuned.

    Script:

    ### plot data into multiplots with variable pointtype and color
    reset session
    
    $Data <<EOD
    Sensitivity,Recall,ID,Param1,Param2
    0.89,0.551,run1,A1,alpha1
    0.93,0.78,run1,A2,alpha2
    0.54,0.76,run1,A2,alpha3
    0.95,0.99,run2,A1,alpha1
    0.354,0.445,run3,A1,alpha1
    0.89,0.72,run4,A2,alpha1
    EOD
    
    set datafile separator comma
    set key noautotitle reverse Left
    graphs = "A1 A2"
    keys1  = "run1 run2 run3 run4"
    keys2  = "alpha1 alpha2 alpha3"
    
    myFilter(colD,colG,valG,colP1,valP1,colP2,valP2) = (strcol(colG) eq word(graphs,g)) && \
        (strcol(colP1) eq word(keys1,valP1)) && (strcol(colP2) eq word(keys2,valP2)) ? column(colD) : NaN
    myPt(i)    = i*2+3
    myColor(i) = int(word("0xff0000 0x00ff00 0x0000ff 0xff00ff 0xffff00 0x00ffff",i))
    
    set xrange[0.3:1]
    set yrange[0.4:1]
    
    set multiplot layout 1,2 margins 0.05,0.8,0.1,0.9
        do for [g=1:words(graphs)] {
            set key at screen 0.95, 0.9-0.06*(g-1)*words(keys1)
            set title word(graphs,g)
            plot for [i=1:words(keys1)] for [j=1:words(keys2)] $Data skip 1 \
                u 1:(myFilter(2,4,g,3,i,5,j)):(myPt(j)):(myColor(i)) w p pt var ps 3 lc rgb var, \
                for [i=1:words(keys1)*(g==1)] keyentry w boxes fs solid 1.0 lc rgb myColor(i) ti word(keys1,i), \
                for [i=1:words(keys2)*(g==2)] keyentry w p ps 2 pt myPt(i) lc "grey50" ti word(keys2,i)    
        }
    unset multiplot
    ### end of script
    

    Result:

    enter image description here