Search code examples
gnuplothistogram

Gnuplot: Creating a histogram with gnuplot


I would like to create a histogram on the occurence of text from a single column dataset using gnuplot. I would like some help please. Example for the dataset is like:

UDP
TCP
TCP
UDP
ICMP
ICMP
ICMP
TCP

Solution

  • There are similar questions, e.g. gnuplot automatic stack bar graph, however, still a bit different. The following examples creates some test data. If you know the keywords already and want to have them in a certain order, skip the step of creating a unique list and define Uniques = '...' yourself. It might be advantageous to enclose the items into double quotes in case you have keywords which include spaces.

    • create a unique list of your keywords.
    • define a lookup function via (mis)using the sum function (check help sum)
    • use the plot option smooth (check help smooth frequency) by taking the lookup index as x

    Script: (works with gnuplot>=5.0.0)

    ### histogram: occurrences of keywords
    reset session
    
    # create some random test data
    myKeywords = 'UDP TCP ICMP ABC WWW NET COM FTP HTTP HTTPS'
    set print $Data
        do for [i=1:3000] {
            print word(myKeywords,int(rand(0)*10)+1)
        }
    set print
    
    # create a unique list of strings from a column
    addToList(list,col) = list.( strstrt(list,'"'.strcol(col).'"') > 0 ? '' : ' "'.strcol(col).'"')
    Uniques = ''
    stats $Data u (Uniques=addToList(Uniques,1),'') nooutput
    
    N = words(Uniques)
    Lookup(s) = (sum [_i=1:N] (s eq word(Uniques,_i) ? _idx=_i : 0), _idx)
    
    set xrange [1:N]
    set xtics out
    set ylabel "Counts"
    set grid x,y
    set offsets 0.5,0.5,0.5,0
    set boxwidth 0.8
    
    set style fill transparent solid 0.5 border
    set key noautotitle
    
    plot $Data u (Lookup(strcol(1))):(1):xtic(1) smooth freq w boxes
    ### end of script
    

    Result:

    enter image description here