Search code examples

Vega Lite - Scaling to Large Datasets

I have used the density transform in Vega Lite for smaller datasets. However, I have a larger dataset with millions of observations that is represented more compactly for which I'd like to do a weighted density transform. My attempt as follows:Correct plot


  "$schema": "",
//  My data set is represented more compactly as follows
//  "data": {
//    "values": [
//      {"size": 1, "observations": 1},
//      {"size": 2, "observations": 2},
//      {"size": 3, "observations": 4},
//      {"size": 4, "observations": 6},
//      {"size": 5, "observations": 3},
//    ]
//  },

//  Expanding the dataset produces the right plot but is impractical
//  given data volumes (in the millions of observations)
  "data": {
    "values": [
      {"size": 1, "observation": "observation 1 of 1"},
      {"size": 2, "observation": "observation 1 of 2"},
      {"size": 2, "observation": "observation 2 of 2"},
      {"size": 3, "observation": "observation 1 of 4"},
      {"size": 3, "observation": "observation 2 of 4"},
      {"size": 3, "observation": "observation 3 of 4"},
      {"size": 3, "observation": "observation 4 of 4"},
      {"size": 4, "observation": "observation 1 of 6"},
      {"size": 4, "observation": "observation 2 of 6"},
      {"size": 4, "observation": "observation 3 of 6"},
      {"size": 4, "observation": "observation 4 of 6"},
      {"size": 4, "observation": "observation 5 of 6"},
      {"size": 4, "observation": "observation 6 of 6"},
      {"size": 5, "observation": "observation 1 of 1"},
      {"size": 5, "observation": "observation 2 of 2"}
  "mark": "area",
  "transform": [
//  I believe Vega has a weight parameter in the density transform
//  Is there an equivalent in Vega Lite?
      //"weight": "observations",
      "density": "size"
  "encoding": {
    "x": {"field": "value", "type": "quantitative"},
    "y": {"field": "density", "type": "quantitative"}


The dataset I have available to me is commented out above. Expanding out the dataset produces the correct plot. However, given the number of observations, I suspect this is impractical unless there's a performant way to do this inside Vega Lite.

I believe Vega has a weight parameter in the density transform, but in the environment I'm working, I only have access to Vega Lite. Is there another way to think about producing a weighted density transform in Vega Lite?


  • That weight parameter in Vega isn't what you're looking for - it is to weight the different probability distributions if you need to use multiple types. Out of the box, both Vega and Vega-Lite are not suitable for scaling to huge datasets but there are several projects that use Vega to scale to large datasets.

    If you can't use one of the other projects, you're only option it to precompute the distributions and get Vega to display the result.