Search code examples
pythonorange

Orange - rewriting data by making new rows based on column values


I am trying to process covid-19 cases data

(the source, for interest: https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv)

The data forms a matrix, listing dates in columns and countries in rows. A simplified view:

country 1/20/20 1/21/20 1/22/20 ... etc. ...
China   100     120     144     ... etc. ...
US      0       0       1       ... etc. ...
...
etc.
...

I am trying to turn the date columns and the figures into two new features, say "date" and "confirmed", for as in:

country date     confirmed
China   1/20/20  100
China   1/21/20  120
China   1/22/20  144
US      1/20/20  0
US      1/21/20  0
US      1/22/20  1
...  etc.  ...

I am interested in any solution that embeds in Orange, though - of course - we can prepare the data before importing it!


Solution

  • Are you trying to do it in a script or with canvas? With (pure) canvas, I guess you can't. You can of course always use a Python script widget and do it there.

    In a script (standalone or within canvas) you should treat Orange.data.Table as immutable, although this is not enforced by Orange itself. A few versions back the obsolete methods that could change the number of rows were removed. You can still change the data in-place, but I wouldn't recommend it.

    You will have to create a new table that will have the appropriate size from the start. I guess the simplest way to do it would be to collect all the data you need in a Python list (of lists) and then pass it to Table.from_list.

    Disclosure: I'm one of Orange developers and I'm in the middle of writing a blog post using exactly this data. It's going to be a series and we would also show some scripts like this in a week or two.