I would like to create a histogram on the occurence of text from a single column dataset using gnuplot. I would like some help please. Example for the dataset is like:
UDP
TCP
TCP
UDP
ICMP
ICMP
ICMP
TCP
There are similar questions, e.g. gnuplot automatic stack bar graph,
however, still a bit different.
The following examples creates some test data.
If you know the keywords already and want to have them
in a certain order, skip the step of creating a unique list and define Uniques = '...'
yourself. It might be advantageous to enclose the items into double quotes in case you have keywords which include spaces.
help sum
)help smooth frequency
) by taking the lookup index as xScript: (works with gnuplot>=5.0.0)
### histogram: occurrences of keywords
reset session
# create some random test data
myKeywords = 'UDP TCP ICMP ABC WWW NET COM FTP HTTP HTTPS'
set print $Data
do for [i=1:3000] {
print word(myKeywords,int(rand(0)*10)+1)
}
set print
# create a unique list of strings from a column
addToList(list,col) = list.( strstrt(list,'"'.strcol(col).'"') > 0 ? '' : ' "'.strcol(col).'"')
Uniques = ''
stats $Data u (Uniques=addToList(Uniques,1),'') nooutput
N = words(Uniques)
Lookup(s) = (sum [_i=1:N] (s eq word(Uniques,_i) ? _idx=_i : 0), _idx)
set xrange [1:N]
set xtics out
set ylabel "Counts"
set grid x,y
set offsets 0.5,0.5,0.5,0
set boxwidth 0.8
set style fill transparent solid 0.5 border
set key noautotitle
plot $Data u (Lookup(strcol(1))):(1):xtic(1) smooth freq w boxes
### end of script
Result: