Search code examples
vega-lite

How to add an extra field to the dataset in Vega-Lite


My data set is an array of the following form:

[
  { "DATE" : "2020-01-02", "COUNTRY" : "Spain", "COUNT" : 110 },
  { ... },
  { ... }
]

There are multiple countries and multiple days. There are no gaps in dates.

I want to inject field DAYS_PASSED (and subsequently use it for the X axis) using the following algorithm:

  1. Check the value of DAYS_PASSED for the previous day for the same country and assign it to variable TEMP. (If the previous day does not exist, assume 0);
  2. Calculate DAYS_PASSED using the following formula:
   if TEMP > 0, then DAYS_PASSED = TEMP + 1
   else-if COUNT > 100 then DAYS_PASSED = 1
   else DAYS_PASSED = 0

So far I have done this in a preprocessing step (outside of Vega-Lite) but I was wondering if it was possible to migrate the calculation to Vega-Lite, maybe by plugging-in in a JavaScript function somehow?

I would also like to be able to expose 100 (from the COUNT > 100 condition) in the graph so that the user can tweak it to, say, 200.


Solution

  • You can do this with a series of transforms; for example:

    
      "transform": [
        {"calculate": "toDate(datum.DATE)", "as": "date"},
        {"calculate": "datum.COUNT < 100", "as": "pre100"},
        {
          "joinaggregate": [{"op": "sum", "field": "pre100", "as": "offset"}],
          "groupby": ["COUNTRY"]
        },
        {
          "window": [{"op": "count", "as": "daysPassed"}],
          "groupby": ["COUNTRY"],
          "sort": [{"field": "date"}]
        },
        {"calculate": "max(0, datum.daysPassed - datum.offset)", "as": "daysPassed"}
      ],
    

    Here is a more full example showing this for a small dataset (vega editor):

    {
      "data": {
        "values": [
          {"DATE": "2020-02-02", "COUNTRY": "Spain", "COUNT": 50},
          {"DATE": "2020-02-03", "COUNTRY": "Spain", "COUNT": 70},
          {"DATE": "2020-02-04", "COUNTRY": "Spain", "COUNT": 110},
          {"DATE": "2020-02-05", "COUNTRY": "Spain", "COUNT": 150},
          {"DATE": "2020-02-06", "COUNTRY": "Spain", "COUNT": 200},
          {"DATE": "2020-02-02", "COUNTRY": "Italy", "COUNT": 90},
          {"DATE": "2020-02-03", "COUNTRY": "Italy", "COUNT": 100},
          {"DATE": "2020-02-04", "COUNTRY": "Italy", "COUNT": 140},
          {"DATE": "2020-02-05", "COUNTRY": "Italy", "COUNT": 190},
          {"DATE": "2020-02-06", "COUNTRY": "Italy", "COUNT": 250}
        ]
      },
      "transform": [
        {"calculate": "toDate(datum.DATE)", "as": "date"},
        {"calculate": "datum.COUNT < 100", "as": "pre100"},
        {
          "joinaggregate": [{"op": "sum", "field": "pre100", "as": "offset"}],
          "groupby": ["COUNTRY"]
        },
        {
          "window": [{"op": "count", "as": "daysPassed"}],
          "groupby": ["COUNTRY"],
          "sort": [{"field": "date"}]
        },
        {"calculate": "max(0, datum.daysPassed - datum.offset)", "as": "daysPassed"}
      ],
      "concat": [
        {
          "mark": "line",
          "encoding": {
            "x": {"field": "DATE", "type": "temporal"},
            "y": {"field": "COUNT", "type": "quantitative"},
            "color": {"field": "COUNTRY", "type": "nominal"}
          }
        },
        {
          "mark": "line",
          "transform": [{"filter": "datum.daysPassed > 0"}],
          "encoding": {
            "x": {"field": "daysPassed", "type": "quantitative"},
            "y": {"field": "COUNT", "type": "quantitative"},
            "color": {"field": "COUNTRY", "type": "nominal"}
          }
        }
      ]
    }
    

    enter image description here