Search code examples
vegavega-lite

Vega-lite Multiple aggregations in Transforms


I want to implement two diff aggregations in a transform as they have different groupby conditions, but it seems not possible with vega-lite

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "data": {

      {"response":200,"request":"/ST"},
      {"response":500,"request":"/ST"},
      {"response":200,"request":"/PP"},
      {"response":500,"request":"/PP"},
      {"response":200,"request":"/CP"},
      {"response":200,"request":"/CP"},
      {"response":500,"request":"/CP"},
      {"response":500,"request":"/CP"},
      {"response":500,"request":"/CP"},
      {"response":500,"request":"/CP"},
      {"response":500,"request":"/CP"},
      {"response":500,"request":"/CP"},
      {"response":503,"request":"/CP"},
      {"response":503,"request":"/CP"},
      {"response":503,"request":"/CP"}

  "transform": [
    {
      "aggregate": [{
       "op": "count",
       "as": "response_count"
      }],
      "groupby": ["response","request"]
    },
    {
      "aggregate": [{
       "op": "count",
       "as": "response_c"
      }],
      "groupby": ["request"]
    }
     ],

     {"mark": "bar",
      "encoding": {
        "x": {"field": "response_count", "type": "quantitative", "stack": "zero"},
        "y": {"field": "request", "type": "nominal"},
        "color": {"field": "response", "type": "nominal"}}
}

Is there any way of achieving this ? Are multiple aggregations like this supported?


Solution

  • Yes, multiple aggregations like this are supported, but your chart has undefined fields in the end because you have have not referenced them in your aggregations. You start with this data:

    [
      {"response": 200, "request": "/ST"},
      {"response": 500, "request": "/ST"},
      {"response": 200, "request": "/PP"},
      {"response": 500, "request": "/PP"},
      {"response": 200, "request": "/CP"},
      {"response": 200, "request": "/CP"},
      {"response": 500, "request": "/CP"},
      {"response": 500, "request": "/CP"},
      {"response": 500, "request": "/CP"},
      {"response": 500, "request": "/CP"},
      {"response": 500, "request": "/CP"},
      {"response": 500, "request": "/CP"},
      {"response": 503, "request": "/CP"},
      {"response": 503, "request": "/CP"},
      {"response": 503, "request": "/CP"}
    ]
    

    The first aggregation groups by "response" and "request", and adds "response_count" in each group, which looks like this:

    [
      {"response": 200, "request": "/ST", "response_count": 1},
      {"response": 500, "request": "/ST", "response_count": 1},
      {"response": 200, "request": "/PP", "response_count": 1},
      {"response": 500, "request": "/PP", "response_count": 1},
      {"response": 200, "request": "/CP", "response_count": 2},
      {"response": 500, "request": "/CP", "response_count": 6},
      {"response": 503, "request": "/CP", "response_count": 3},
    ]
    

    Your second aggregation takes this, groups by "request", and adds "response_c" in each group, which looks like this:

    [
      {"request": "/ST", "response_c": 2},
      {"request": "/PP", "response_c": 2},
      {"request": "/CP", "response_c": 3},
    ]
    

    Notice that any fields you don't reference in the aggregation are dropped.

    Your specification then refers to fields that no longer exist in the dataset, which results in a blank chart.

    You could fix this by specifying in the second aggregation what you want to happen with the fields that are dropped; for example, you could keep the sum of "response_count" and the minimum of "response" (open in editor):

    {
      "data": {
        "values": [
          {"response": 200, "request": "/ST"},
          {"response": 500, "request": "/ST"},
          {"response": 200, "request": "/PP"},
          {"response": 500, "request": "/PP"},
          {"response": 200, "request": "/CP"},
          {"response": 200, "request": "/CP"},
          {"response": 500, "request": "/CP"},
          {"response": 500, "request": "/CP"},
          {"response": 500, "request": "/CP"},
          {"response": 500, "request": "/CP"},
          {"response": 500, "request": "/CP"},
          {"response": 500, "request": "/CP"},
          {"response": 503, "request": "/CP"},
          {"response": 503, "request": "/CP"},
          {"response": 503, "request": "/CP"}
        ]
      },
      "transform": [
        {
          "aggregate": [{"op": "count", "as": "response_count"}],
          "groupby": ["response", "request"]
        },
        {
          "aggregate": [
            {"op": "count", "as": "response_c"},
            {"op": "sum", "field": "response_count", "as": "response_count"},
            {"op": "min", "field": "response", "as": "response"}
          ],
          "groupby": ["request"]
        }
      ],
      "mark": "bar",
      "encoding": {
        "x": {"field": "response_count", "type": "quantitative", "stack": "zero"},
        "y": {"field": "request", "type": "nominal"},
        "color": {"field": "response", "type": "nominal"}
      }
    }
    

    enter image description here

    Better in this particular case might be to leave out the second aggregation altogether, in which case the second aggregation essentially happens visually via stacking of the bars (editor):

    {
      "data": {
        "values": [
          {"response": 200, "request": "/ST"},
          {"response": 500, "request": "/ST"},
          {"response": 200, "request": "/PP"},
          {"response": 500, "request": "/PP"},
          {"response": 200, "request": "/CP"},
          {"response": 200, "request": "/CP"},
          {"response": 500, "request": "/CP"},
          {"response": 500, "request": "/CP"},
          {"response": 500, "request": "/CP"},
          {"response": 500, "request": "/CP"},
          {"response": 500, "request": "/CP"},
          {"response": 500, "request": "/CP"},
          {"response": 503, "request": "/CP"},
          {"response": 503, "request": "/CP"},
          {"response": 503, "request": "/CP"}
        ]
      },
      "transform": [
        {
          "aggregate": [{"op": "count", "as": "response_count"}],
          "groupby": ["response", "request"]
        }
      ],
      "mark": "bar",
      "encoding": {
        "x": {"field": "response_count", "type": "quantitative", "stack": "zero"},
        "y": {"field": "request", "type": "nominal"},
        "color": {"field": "response", "type": "nominal"}
      }
    }
    

    enter image description here