Search code examples
gnuplot

Format values to show thousands, millions, billions, trillions instead of scientific power


Give a very simple chart generated with gnuplot:

$Data << EOD
2021-09-30 83360000000.0000
2021-12-31 123945000000.0000
2022-03-31 97278000000.0000
2022-06-30 82959000000.0000
2022-09-30 90146000000.0000
2022-12-31 117154000000.0000
2023-03-31 94836000000.0000
2023-06-30 81797000000.0000
2023-09-30 89498000000.0000
EOD

set xdata time
set timefmt "%Y-%m-%d"

set format y "%.0s %c"

plot $Data using 1:2 with boxes

Is there a way to format y-axis to use more k for thousands, m for millions, b for billions and t for trillions instead of character replacement of scientific notation (i.e. k, M, G, etc.)?


Solution

  • Here is a suggestion for only one prefix at a time on the axis. Logarithmic scale with multiple prefixes is probably also possible but I guess it will become difficult with autoscaling and you have to place the tic-labels "semi-manually".

    You need to determine the order of magnitude of your range. You have to be careful, because gnuplot seems to have issues with formatting and determining the correct power. For example:

    print int(log10(1000)/3)    # should return 1, but returns 0
    
    print gprintf("%T",95)      # should return 1, but returns 2
    

    The first result is certainly a rounding error (binary/decimal representation), but the second result I would consider a bug in the gprintf() function. Anyway, I guess the following function (via detour of a string) will return the correct power (at least, I haven't seen wrong results so far):

    power(n) = int(sprintf("%e",abs(n))[strstrt(sprintf("%e",abs(n)),'e')+1:])
    

    So, what the script does:

    • via stats get the maximum absolute value to determine the prefix
    • get the prefix and set the format accordingly
    • when plotting divide the data values by the according value

    So, consider this a starting point...

    Script:

    ### use different prefixes k,m,b,t
    reset session
    
    $Data1 << EOD
    2021-09-30  83360000000.0000
    2021-12-31 123945000000.0000
    2022-03-31  97278000000.0000
    EOD
    
    $Data2 <<EOD
    2021-09-30  8336000.0000
    2021-12-31 12394500.0000
    2022-03-31  9727800.0000
    EOD
    
    $Data3 <<EOD
    2021-09-30  833.60
    2021-12-31 1239.45
    2022-03-31  972.78
    EOD
    
    power(n)  = int(sprintf("%e",abs(n))[strstrt(sprintf("%e",abs(n)),'e')+1:])
    prefix(n) = word('"" k m b t', power(n)/3+1)
    
    set format x "%Y\n%b" timedate
    set yrange [0:]
    set style fill solid 0.4
    set boxwidth 0.8 relative
    set key noautotitle
    
    set multiplot layout 3,1
        stats $Data1 u (abs($2)) nooutput
        pow3 = int(power(STATS_max)/3)*3
        set format y "%.1f ".prefix(STATS_max)
        plot $Data1 using (timecolumn(1,"%Y-%m-%d")):($2/10**pow3) with boxes
    
        stats $Data2 u (abs($2)) nooutput
        pow3 = int(power(STATS_max)/3)*3
        set format y "%.1f ".prefix(STATS_max)
        plot $Data2 using (timecolumn(1,"%Y-%m-%d")):($2/10**pow3) with boxes
    
        stats $Data3 u (abs($2)) nooutput
        pow3 = int(power(STATS_max)/3)*3
        set format y "%.1f ".prefix(STATS_max)
        plot $Data3 using (timecolumn(1,"%Y-%m-%d")):($2/10**pow3) with boxes
    unset multiplot
    ### end of script
    

    Result:

    enter image description here