Gnuplot: Creating a histogram with gnuplot

I would like to create a histogram on the occurence of text from a single column dataset using gnuplot. I would like some help please. Example for the dataset is like:

UDP
TCP
TCP
UDP
ICMP
ICMP
ICMP
TCP

Solution

There are similar questions, e.g. gnuplot automatic stack bar graph, however, still a bit different. The following examples creates some test data. If you know the keywords already and want to have them in a certain order, skip the step of creating a unique list and define Uniques = '...' yourself. It might be advantageous to enclose the items into double quotes in case you have keywords which include spaces.

create a unique list of your keywords.
define a lookup function via (mis)using the sum function (check help sum)
use the plot option smooth (check help smooth frequency) by taking the lookup index as x

Script: (works with gnuplot>=5.0.0)

### histogram: occurrences of keywords
reset session

# create some random test data
myKeywords = 'UDP TCP ICMP ABC WWW NET COM FTP HTTP HTTPS'
set print $Data
    do for [i=1:3000] {
        print word(myKeywords,int(rand(0)*10)+1)
    }
set print

# create a unique list of strings from a column
addToList(list,col) = list.( strstrt(list,'"'.strcol(col).'"') > 0 ? '' : ' "'.strcol(col).'"')
Uniques = ''
stats $Data u (Uniques=addToList(Uniques,1),'') nooutput

N = words(Uniques)
Lookup(s) = (sum [_i=1:N] (s eq word(Uniques,_i) ? _idx=_i : 0), _idx)

set xrange [1:N]
set xtics out
set ylabel "Counts"
set grid x,y
set offsets 0.5,0.5,0.5,0
set boxwidth 0.8

set style fill transparent solid 0.5 border
set key noautotitle

plot $Data u (Lookup(strcol(1))):(1):xtic(1) smooth freq w boxes
### end of script

Result: