Search code examples
graphplotgnuplot

gnuplot: plot points with color based values in one string column and show string in legend


I would like to plot results of classification and mark true classes. So, basically what I need is to assign a color for each point base on value in a string column.

dataset looks like this:

5.1 3.5 1.4 0.2 Iris-setosa

I ended up with script following solution (thanks to the answer in here: How to make points one color when a third column equals zero, and another color otherwise, in Gnuplot?)

set palette model RGB defined (0 "red",1 "blue", 2 "green")
plot 'iris.data' using 1:2:5 notitle with points pt 2 palette

in the original dataset I replaced string labels with numbers, because I don't know how to work with strings in gnuplot. Is there a way how to map string to colors?

Currently the output looks like this: gnuplot coloring points

However I don't like the gradient palette because it doesn't make sense in this case. I would prefer normal legend with a single color and name of the class. Any idea how to do that?


Solution

  • A way how you could do that is by using awk.

    Using a data file Data.csv:

    5.4452 4.6816 blue
    1.2079 9.4082 red
    7.4732 6.5507 red
    2.3329 8.2996 red
    3.4535 2.1937 green
    1.7909 2.5173 green
    2.5383 7.9700 blue
    

    and this script:

    set pointsize 3
    plot "< awk '{if($3 == \"red\") print}' Data.csv" u 1:2 t "red" w p pt 2, \
         "< awk '{if($3 == \"green\") print}' Data.csv" u 1:2 t "green" w p pt 2, \
         "< awk '{if($3 == \"blue\") print}' Data.csv" u 1:2 t "blue" w p pt 2
    

    you get this plot:

    enter image description here

    What awk does is simply check the third parameter of the data file and only print the line if it has some value: like red or blue.

    You would also get rid of the palette with the gradient.

    The script could be further improved by using gnuplot iterations.