Search code examples
gnuplotvisualizationstacked-bar-chart

Correct xtics and keys with columnstacked histograms from CSV (Gnuplot)


I'm trying to visualize accumulated working hours for different projects in rows (Foo Stuff, Bar Stuff, ...), recorded by employees in columns (alice, bob, ...) in a CSV file like this:

id,title,alice,bob,charlie,diego
foo,Foo Stuff,2,,3,1
bar,Bar Stuff,1,5,,
baz,Baz Stuff,,1,8,5

My goal is to get a stacked bar histogram with x axis being the employees, y axis the accumulated work hours per employee. My current approach is this gnuplot (I'm using 5.4.3) script:

set datafile separator ','

set style data histograms
set style histogram columnstacked
set style fill solid noborder
set boxwidth 0.75
set xtics rotate by 90 right
set grid ytics linestyle 0

set key outside
set xlabel "Employee"
set ylabel "Working hours"

plot for [COL=3:*] 'example.csv' using COL title columnhead

enter image description here

I'm new to gnuplot and it's not that easy for me to extract everything from the documentation. My most important questions here are:

  1. How to get the xtic labels right? There's a double diego after the last column.
  2. How to get a nice legend with project names (2nd column)?

Solution

  • To your questions:

    1. it looks like the loop with for [COL=2:*] creates this double "diego". Looks like a bug to me (maybe only together with stacked histogram style)

    You can avoid it by

    • either set for [COL=2:6] if you know the number of columns beforehand
    • or first find out the correct number of columns via stats stored in the variable STATS_columns (check help stats).
    1. add an invisible plot just for the legend using with boxes and store the second column in the variable t which will be used for the title.

    Maybe there are shorter and smarter solutions, but check the following example as starting point. It works for gnuplot>=5.4.0, but for older versions there might be some other workarounds.

    Script: (works for gnuplot>=5.4.0)

    ###  stacked bar chart with autocolumns and titles from column
    reset session
    
    $Data <<EOD
    id,title,alice,bob,charlie,diego
    foo,Foo Stuff,2,,3,1
    bar,Bar Stuff,1,5,,
    baz,Baz Stuff,,1,8,5
    EOD
    
    set datafile separator ','
    set style data histograms
    set style histogram columnstacked
    set style fill solid noborder
    set boxwidth 0.75
    set xtics rotate by 90 right
    set grid ytics linestyle 0
    set key outside
    set xlabel "Employee"
    set ylabel "Working hours"
    
    stats $Data u 0 nooutput
    colMax = STATS_columns
    rowMax = STATS_records
    
    plot for [COL=3:colMax] $Data u COL ti columnhead, \
         for [ROW=1:rowMax-1] '' every ::ROW::ROW u (t=strcol(2),NaN) w boxes lc ROW ti t
    ### end of script
    

    Attempt to explain the second part of the plot command:

    • it loops from the second row to the last row (row-indices in gnuplot are zero-based)
    • every ::ROW::ROW limits it to one single row (check help every)
    • during plotting it assigns column 2 of the current row as string to variable t
    • because of serial evaluation (...,NaN) (check help operators binary) actually nothing is plotted, but a legend is created nevertheless
    • t will be use as legend title (this only works for gnuplot>=5.4.0)

    I still hope there is a "nicer" solution.

    Result:

    enter image description here