Search code examples
gnuplot

Plotting COVID-19 data in Gnuplot


I'm trying to plot (GNUPlot) some covid-19 data contained in a CSV file which uses the first row as the time data and corresponding case counts in each column. I'd like to make a single plot for each state (each row) but not having much luck. Any help? This is what my plot script is so far. I'm using plot for [col=5:30:1]... in the script because the first 4 columns are the state name and geolocation. I thought I'd just concentrate on the datapoints for now and eventually figure out how to display the state name on the plot as well. I've grep'd the USA data out of the main CSV data to create "us.dat":

set key autotitle columnhead
set term png size 1024, 768
set key outside
set datafile separator ","
set title 'mygraph'
set ylabel 'count'
set xlabel 'time'
set grid
set term png
set output "/tmp/covid19.png"    
plot for [col=5:30:1] "us.dat" using col

And a snip of the "us.dat" file:

Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,2/1/20,2/2/20,2/3/20,2/4/20,2/5/20,2/6/20,2/7/20,2/8/20,2/9/20,2/10/20,2/11/20,2/12/20,2/13/20,2/14/20,2/15/20,2/16/20,2/17/20,2/18/20,2/19/20,2/20/20,2/21/20,2/22/20,2/23/20,2/24/20,2/25/20,2/26/20,2/27/20,2/28/20,2/29/20,3/1/20,3/2/20,3/3/20,3/4/20,3/5/20,3/6/20,3/7/20,3/8/20,3/9/20,3/10/20,3/11/20
Washington,US,47.4009,-121.4905,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,267,366
New York,US,42.1657,-74.9481,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,173,220
California,US,36.1162,-119.6816,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,144,177
Massachusetts,US,42.2302,-71.5301,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,92,95

The plot image isn't quite right however:

plot image


Solution

  • Here is a pure gnuplot version

    $data <<EOD
    Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,2/1/20,2/2/20,2/3/20,2/4/20,2/5/20,2/6/20,2/7/20,2/8/20,2/9/20,2/10/20,2/11/20,2/12/20,2/13/20,2/14/20,2/15/20,2/16/20,2/17/20,2/18/20,2/19/20,2/20/20,2/21/20,2/22/20,2/23/20,2/24/20,2/25/20,2/26/20,2/27/20,2/28/20,2/29/20,3/1/20,3/2/20,3/3/20,3/4/20,3/5/20,3/6/20,3/7/20,3/8/20,3/9/20,3/10/20,3/11/20
    Washington,US,47.4009,-121.4905,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,267,366
    New York,US,42.1657,-74.9481,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,173,220
    California,US,36.1162,-119.6816,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,144,177
    Massachusetts,US,42.2302,-71.5301,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,92,95
    EOD
    
    N = 50
    array X[N]
    array Y[N]
    
    set datafile separator ","
    
    # a dummy plot to extract the row into an array
    pl $data us ($0==0? sum[i=1:N](X[i]=strcol(i+4), 0) :\
                 (strcol(1) eq "Washington")? sum[i=1:N](Y[i]=column(i+4)) : $0, $0) : 0
    
    set xdata time
    set timefmt "%m/%d/%y"
    
    plot X us (X[$1]):(Y[$1]) w lp pt 7
    

    enter image description here

    Explanation:

    First, there is a dummy plot. When the first row is entered ($0==0), there is loop over all column to store the dates into array X. Similar, all columns are stored into array Y, when column Washington is entered. The number of columns and their offset should be known in advance.

    The sum function is only (mis)used as loop. Since the date row contains string, the , 0 is provided, since strings cannot be summed.