Search code examples
vega-lite

Vega lite select N number of objects (count)


I just started using Vega lite and was wondering how to cut out everything after my 10th object (I have thousands of rows and am just interested in the top 10).

This is what I have so far:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "data": {
    "url": "https://raw.githubusercontent.com/DanStein91/Info-vis/master/anage.csv",
    "format": {
      "type": "csv"
    }
  },
  "transform": [
    {
      "filter": {
        "field": "Female_maturity_(days)",
        "gt": 0
                }
    }

  ],
  "title": {
    "text": "",
    "anchor": "middle"
  },
  "mark": "bar",
  "encoding": {
    "y": {
      "field": "Common_name",
      "type": "nominal",
      "sort": {
        "op": "mean",
        "field": "Female_maturity_(days)",
        "order": "descending"
      }
    },
    "x": {
      "field": "Female_maturity_(days)",
      "type": "quantitative"
    }
  },
  "config": {}
}

Solution

  • You can follow the Filtering Top K Items example from the documentation. The result looks something like this (view in vega editor):

    {
      "data": {
        "url": "https://raw.githubusercontent.com/DanStein91/Info-vis/master/anage.csv",
        "format": {"type": "csv", "parse": {"Female_maturity_(days)": "number"}}
      },
      "transform": [
        {
          "window": [{"op": "rank", "as": "rank"}],
          "sort": [{"field": "Female_maturity_(days)", "order": "descending"}]
        },
        {"filter": "datum.rank <= 10"}
      ],
      "mark": "bar",
      "encoding": {
        "y": {
          "field": "Common_name",
          "type": "nominal",
          "sort": {
            "op": "mean",
            "field": "Female_maturity_(days)",
            "order": "descending"
          }
        },
        "x": {"field": "Female_maturity_(days)", "type": "quantitative"}
      },
      "title": {"text": "", "anchor": "middle"}
    }
    

    enter image description here

    One note: when doing transforms on CSV data (as opposed to JSON data), it's important to use format.parse to specify the desired data type for the columns: by default, CSV columns are interpreted as strings, which can cause sorting-based operations to behave in unexpected ways.