I'm trying to create three histograms in one chart. Selecting the appropriate data works, but "smooth freq" doesn't work as expected.
$Data <<EOD
Med Gender Age
4 f 33.14
4 f 53.81
4 f 32.99
4 m 39.78
4 f 25.06
2 m 51.06
4 f 39.93
4 f 44.92
2 m 45.68
2 m 73.47
2 m 61.65
4 m 26.82
4 f 24.93
4 f 29.79
3 m 80.54
3 m 81.42
2 f 71.9
2 f 73.18
3 m 64.76
4 m 33.45
2 m 58.92
2 f 73.51
4 f 36.09
EOD
The data set consists of three different groups. The following "functions" are used to select the age values that belong to each group.
GROUP_LABELS = "2 3 4"
GROUP_NAMES = "Med_02 Med_03 Med_04"
is_true(c,x) = ( c == x ) ? 1.0 : NaN
age = "( column(\"Age\") )"
selected_age_values = "is_true( column(\"Med\"), i ) * @age"
x_min = 0
x_max = 100
n_bins = 20
bin_width = 1.*(x_max - x_min)/n_bins
bin(col) = floor(column(col)/bin_width)*bin_width
set boxwidth 0.5
set xtics out
set xrange[x_min:x_max]
plot for [i in GROUP_LABELS] $Data u ( @selected_age_values ):(1) smooth freq w boxes lc i-1 ti word( GROUP_NAMES, i-1 ) noenhanced
Unfortunately, the resulting chart only shows one spike for each data point, which is at least correctly colored.
I tried to simplify your script a bit, but three histograms into one plot make it a bit complicated again (to plot and to read).
Since you have three histograms, each binwidth (here: 5.0) is split into 3 bars. As an example: the range from 50 to 55 contains a bar from the first group, none from the second and one from the third group. Note, that bars are plotted centered at the value, so you have to set some offset with multiples of half a boxwidth.
The function inGroup()
simply returns 1 or 0 if the i
is identical to the group or not. smooth freq
will then sum up either 0 or 1.
I hope the rest is self-explaining.
There would be different ways of representing this: for example, one xtic for each range (e.g. 50-55) and the 3 bars corresponding to that range centered around the tic.
Script:
### three histograms in one plot
reset session
$Data <<EOD
Med Gender Age
4 f 33.14
4 f 53.81
4 f 32.99
4 m 39.78
4 f 25.06
2 m 51.06
4 f 39.93
4 f 44.92
2 m 45.68
2 m 73.47
2 m 61.65
4 m 26.82
4 f 24.93
4 f 29.79
3 m 80.54
3 m 81.42
2 f 71.9
2 f 73.18
3 m 64.76
4 m 33.45
2 m 58.92
2 f 73.51
4 f 36.09
EOD
GROUP_LABELS = "2 3 4"
GroupName(i) = sprintf("Med_%02d",int(i))
x_min = 0
x_max = 100
n_bins = 20
bin_width = real(x_max - x_min)/n_bins
myBoxwidth = bin_width/words(GROUP_LABELS)
bin(x) = floor(x/bin_width)*bin_width
inGroup(col,i) = column(col) == int(i)
set boxwidth myBoxwidth
set xlabel "Age"
set xrange[x_min:x_max]
set xtics 10 out
set mxtic 2
set ylabel "Count"
set ytics 1
set grid x, mx, y
set style fill transparent solid 0.3
plot for [i in GROUP_LABELS] $Data u (bin(column("Age"))+(i-1.5)*myBoxwidth):(inGroup(1,i)) \
smooth freq w boxes lc i-1 ti GroupName(i) noenhanced
### end of script
Result:
Addition: xlabels showing bin ranges
If you add the following two lines and an additional line to the plot command...
myXtic(i) = sprintf("%d-%d",i*bin_width,(i+1)*bin_width)
set xtics right rotate by 60 offset 2,0
plot for [i in GROUP_LABELS] $Data u (bin(column("Age"))+(i-1.5)*myBoxwidth):(inGroup(1,i)) \
smooth freq w boxes lc i-1 ti GroupName(i) noenhanced, \
for [i=0:n_bins-1] '+' u (i*bin_width):(NaN):xtic(myXtic(i)) every ::::0 notitle
... you will get the following. The 3 bars are actually not centered around the xtic but in between two tics which define the age range. The range 50-55
, actually means: 50<= age <55
.
There are certainly many more ways to create such a graph. I guess one should make it as easy as possible for the reader to understand.