Search code examples
gnuplot

gnuplot: simple beeswarm example


I have been struggling with a basic beeswarm plot from page 62 in this doc. I imagine they are skipping some details, and I'm not sure what actual data they used. I think in particular the problem is mapping a categorical/string variable to an X-axis value.

I used this data:

A 1
A 2
A 3
B 4
B 5
B 6

With this script:

set terminal png
set output "graph.png"
set jitter
plot "data.csv" using 1:2:1 with points lc variable

I get this error:

"graph_script" line 4: warning: Skipping data file with no valid points

plot "data.csv" using 1:2:1 with points lc variable
                                                   ^
"graph_script" line 4: x range is invalid

In their demos gallery, I see something like set xtics ("A" -1, "B" 0) which could maybe help me to label already-numeric data better, but what if my data doesn't start off numeric to begin with?

Do I need something like (hash_string_to_large_int($1) % 2)? There must be an easier way!


Solution

  • As mentioned in the comments you have to "convert" your keys into numbers in order to plot them. You can do this by creating a list with your unique keywords and defining a function to get the indices.

    • First, the following example creates some random data
    • The code after knows nothing about the keywords, so it creates the unique list from scratch from the random data.

    Maybe there is (and I am not aware) a simpler solution with gnuplot only.

    Code:

    ### bee-swarm plot with string keys
    reset session
    
    # create some random test data
    myExts = '.py .sh .html'
    set print $Data
        do for [i=1:100] {
            print sprintf("%s %d",word(myExts,int(rand(0)*3)+1),int(rand(0)*10+1)*5)
        }
    set print
    
    # create a unique list of strings from a data stringcolumn
    Uniques = ''
    addToList(list,col) = list.( strstrt(list,'"'.strcol(col).'"') > 0 ? '' : ' "'.strcol(col).'"')
    stats $Data u (Uniques = addToList(Uniques,1),0) nooutput
    
    getIdx(key) = (_idx=NaN, sum [_i=1:words(Uniques)] (word(Uniques,_i) eq key ? _idx=_i : 0), _idx)
    
    set offsets 0.5,0.5,1,1
    set key noautotitle
    
    set multiplot layout 1,2
    
        set title "No jitter"
        plot $Data u (idx=getIdx(strcol(1))):2:(idx):xtic(word(Uniques,idx)) w points pt 7 lc var
    
        set title "With jitter"
        set jitter
        replot
    unset multiplot
    ### end of code
    

    Result:

    enter image description here