Search code examples
vega-lite

Vega Lite group by specific attributes based on another column


Trying to find the mean of Gestation/Incubation(days) for the attributes:

Accipitriformes, Anseriformes, Charadriiformes 

which are apart of the Aves Class. I do not want to find the mean of any other attributes in the Order Column, only the ones apart of the Aves class. A sample of my dataset looks like this:

|   Class     |   Order   |    Gestation/Incubation(days)     
  Amphilbia       Anura                   5
  Amphilbia       Anura                   4
  Amphilbia       Anura                   2
  Amphilbia       Caudata                 4
  Amphilbia       Caudata                 2
   Mammalia     Artiodactyla              10
   Mammalia     Artiodactyla              8
   Mammalia       Rodentia                14
   Mammalia       Rodentia                13
     Aves      Accipitriformes            12
     Aves      Accipitriformes            17
     Aves      Accipitriformes            12
     Aves       Anseriformes              9
     Aves       Anseriformes              8
     Aves       Anseriformes              9
     Aves     Charadriiformes             10
     Aves     Charadriiformes             12
     Aves     Charadriiformes             14

I am able to find the mean of the different attributes in the Class column such as (see vega-lite demo link):

Amphilbia, Mammalia, Aves 

but I am not able to find the mean of the attributes in the Order Column where Class = Aves.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
   "data": {
     "url": "https://raw.githubusercontent.com/cathal84/COMP40610/master/anage_data.txt",
    "format": {"type": "tsv"}
       },
   "title": {
     "text": " Average Gestation/Incubation days for Orders with Aves",
    "anchor": "middle"
   },
   "width": 600,
   "height": 600,
  "transform": [
     {
       "aggregate": [
        {"op": "average", "field": "Gestation/Incubation (days)", "as": "avg_incub"},
        {"op": "count", "field": "Class", "as": "make_cnt"}
      ],
      "groupby": ["Class"]
    },
     {"filter": "datum.make_cnt > 50"}
   ],
   "mark": {"type": "bar"},
   "encoding": {
     "y": {
       "field": "avg_incub",
       "type": "quantitative",
       "axis": {"title": "Average Incubation"}
     },
     "x": {
       "field": "Class",
       "type": "nominal",
       "sort": {"encoding": "x", "order": "descending"},
       "axis": {"title": "Orders"}
    }
  }
}

Link to demo

I attempted to use the filter function to filter my data so only data with Class == Aves is left but that did not resolve my issue. I must not be using it right. Unless their is another way of doing what I am trying to achieve.

{"filter": "datum.Class == 'Aves'"}

Solution

  • You can do this with two filter steps, one to filter by class, and one to filter by order. At that point, using aggregates within encodings is the most straightforward way to compute the grouped mean.

    Here is an example (vega editor):

    {
      "$schema": "https://vega.github.io/schema/vega-lite/v3.json",
      "data": {
        "url": "https://raw.githubusercontent.com/cathal84/COMP40610/master/anage_data.txt",
        "format": {"type": "tsv"}
      },
      "title": {
        "text": " Average Gestation/Incubation days for Orders with Aves",
        "anchor": "middle"
      },
      "transform": [
        {"filter": "datum.Class == 'Aves'"},
        {"filter": {"field": "Order", "oneOf": ["Accipitriformes", "Anseriformes", "Charadriiformes"]}}
      ],
      "mark": {"type": "bar"},
      "encoding": {
        "x": {"field": "Order", "type": "nominal"},
        "y": {"field": "Gestation/Incubation (days)", "type": "quantitative", "aggregate": "mean"}
      }
    }
    

    enter image description here