Search code examples
vega-litevega

Vega transform to select the first n rows


Is there a Vega/Vega-Lite transform which I can use to select the first n rows in data set?

Suppose I get a dataset from a URL such as:

Person Height
Jeremy 6.2
Alice 6.0
Walter 5.8
Amy 5.6
Joe 5.5

and I want to create a bar chart showing the height of only the three tallest people. Assume that we know for certain that the dataset from the URL is already sorted. Assume that we cannot change the data as returned by the URL.

I want to do something like this:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "url": "heights.csv"
  },
  "transform": [
      {"head": 3}
  ],
  "mark": "bar",
  "encoding": {
    "x": {"field": "Person", "type": "nominal"},
    "y": {"field": "Height", "type": "quantitative"}
  }
}

only the head transform does not actually exist - is there something else I can do to get the same effect?


Solution

  • The Vega-Lite documentation has an example along these lines in filtering top-k items.

    Your case is a bit more specialized: you do not want to order based on rank, but rather based on the original ordering of the data. You can do this using a count-based window transform followed by an appropriate filter. For example (view in editor):

    {
      "data": {
        "values": [
          {"Person": "Jeremy", "Height": 6.2},
          {"Person": "Alice", "Height": 6.0},
          {"Person": "Walter", "Height": 5.8},
          {"Person": "Amy", "Height": 5.6},
          {"Person": "Joe", "Height": 5.5}
        ]
      },
      "transform": [
        {"window": [{"op": "count", "as": "count"}]},
        {"filter": "datum.count <= 3"}
      ],
      "mark": "bar",
      "encoding": {
        "x": {"field": "Height", "type": "quantitative"},
        "y": {"field": "Person", "type": "nominal", "sort": null}
      }
    }
    

    enter image description here