The histogram does not summarize all values within a bin. The boxes are placed in the correct bin but printed separately on top of each other. This is visible when the fill style is transparent.
Some macros were included via "load" from other scripts, but the reduced code below should show the problem. I removed label and title settings so the resulting plot looks slightly different than the attached one.
reset
set key autotitle columnhead
#=======================================================================================
# from other gnuplot scripts included via "load"
fs_reference_age_min = 20
fs_reference_age_max = 100
gp_is_in_range(c,a,b) = ( c > a && c <= b ) ? 1.0 : NaN
fs_is_in_age_range(c) = gp_is_in_range( c, fs_reference_age_min, fs_reference_age_max )
#=======================================================================================
fs_age = 'PatientAge'
valid_age = 'fs_is_in_age_range( column(fs_age) )'
bin(x) = floor(x/bin_width)*bin_width
x_min = 0
x_max = 100
n_bins = 20
bin_width = real(x_max - x_min)/n_bins
group_boxwidth = 1
set boxwidth group_boxwidth*0.75
set style fill transparent solid 0.3
n = 4
offset = n
$Data <<EOD
Subgroup PatientAge
4 40.55
4 48.96
1 34.94
5 51.45
1 54.8
2 10.51
4 42.87
3 71.41
4 62.2
2 54.22
3 65.04
1 49.73
4 31.46
3 75.25
1 56.97
2 14.56
2 10.64
3 60.54
EOD
plot $Data u ( bin( column(fs_age) ) + ( offset - 0.5 ) * group_boxwidth ):( @valid_age ) smooth freq w boxes lc n ti 'NORM_DB' noenhanced
Thank you for providing a copy & paste minimal (non-)working example including data. This makes debugging much easier if one has all the information right away.
You filter your data you are introducing NaN
s.
That's what you are doing with
gp_is_in_range(c,a,b) = ( c > a && c <= b ) ? 1.0 : NaN
This is introducing breaks in your data, e.g. a line plot would be interrupted.
So, in order to visualize, if you plot your smooth freq
into a table you would see the following:
# Curve 0 of 1, 17 points
# Curve title: "NORM_DB"
# x y xlow xhigh type
33.5 1 33.5 33.5 i
43.5 1 43.5 43.5 i
48.5 1 48.5 48.5 i
53.5 2 53.5 53.5 i
33.5 1 33.5 33.5 i
43.5 1 43.5 43.5 i
48.5 1 48.5 48.5 i
53.5 1 53.5 53.5 i
58.5 1 58.5 58.5 i
63.5 1 63.5 63.5 i
68.5 1 68.5 68.5 i
73.5 1 73.5 73.5 i
78.5 1 78.5 78.5 i
63.5 1 63.5 63.5 i
That's the data you put into the smooth freq
option.
And apparently, smooth freq
treats different blocks individually. That's why you get 3 histograms or bar charts on top of each other.
So, simple solution (for gnuplot>5.0.6): before the plot command insert a line:
set datafile missing NaN