I would like to plot results of classification and mark true classes. So, basically what I need is to assign a color for each point base on value in a string column.
dataset looks like this:
5.1 3.5 1.4 0.2 Iris-setosa
I ended up with script following solution (thanks to the answer in here: How to make points one color when a third column equals zero, and another color otherwise, in Gnuplot?)
set palette model RGB defined (0 "red",1 "blue", 2 "green")
plot 'iris.data' using 1:2:5 notitle with points pt 2 palette
in the original dataset I replaced string labels with numbers, because I don't know how to work with strings in gnuplot. Is there a way how to map string to colors?
Currently the output looks like this:
However I don't like the gradient palette because it doesn't make sense in this case. I would prefer normal legend with a single color and name of the class. Any idea how to do that?
A way how you could do that is by using awk.
Using a data file Data.csv
:
5.4452 4.6816 blue
1.2079 9.4082 red
7.4732 6.5507 red
2.3329 8.2996 red
3.4535 2.1937 green
1.7909 2.5173 green
2.5383 7.9700 blue
and this script:
set pointsize 3
plot "< awk '{if($3 == \"red\") print}' Data.csv" u 1:2 t "red" w p pt 2, \
"< awk '{if($3 == \"green\") print}' Data.csv" u 1:2 t "green" w p pt 2, \
"< awk '{if($3 == \"blue\") print}' Data.csv" u 1:2 t "blue" w p pt 2
you get this plot:
What awk does is simply check the third parameter of the data file and only print the line if it has some value: like red or blue.
You would also get rid of the palette with the gradient.
The script could be further improved by using gnuplot iterations.