Search code examples
powerbivega-litedeneb

Calculate distribution plot with arbitrary quantiles


The following vega-lite code creates a distribution plot, with the minimum value, the 25 % quantile, the median, the 50 % quantile, and the 75 % quantile and the maximum value, similar to a boxplot diagram. What this renders, is shown in the below picture. I have a dataset with two columns, the first column stands for different categories ("1", "2" or "3") and the seconds column is a numerical variable ("Total Sales").

{
  "data": {"name": "dataset"},
  "transform": [
    {
      "aggregate": [
        {
          "op": "min",
          "field": "Total Sales",
          "as": "Min"
        },
        {
          "op": "max",
          "field": "Total Sales",
          "as": "Max"
        },
        {
          "op": "median",
          "field": "Total Sales",
          "as": "Median"
        },
        {
          "op": "q1",
          "field": "Total Sales",
          "as": "Q1"
        },
        {
          "op": "q3",
          "field": "Total Sales",
          "as": "Q3"
        }
      ],
      "groupby": ["Subcategory"]
    }
  ],
  "layer": [
    {
      "mark": {
        "type": "bar",
        "color": "red"
      },
      "encoding": {
        "y": {
          "field": "Min",
          "type": "quantitative"
        },
        "y2": {"field": "Q1"}
      }
    },
    {
      "mark": {
        "type": "bar",
        "color": "blue"
      },
      "encoding": {
        "y": {
          "field": "Q1",
          "type": "quantitative"
        },
        "y2": {"field": "Median"}
      }
    },
    {
      "mark": {
        "type": "bar",
        "color": "green"
      },
      "encoding": {
        "y": {
          "field": "Median",
          "type": "quantitative"
        },
        "y2": {"field": "Q3"}
      }
    },
    {
      "mark": {
        "type": "bar",
        "color": "yellow"
      },
      "encoding": {
        "y": {
          "field": "Q3",
          "type": "quantitative"
        },
        "y2": {"field": "Max"}
      }
    }
  ],
  "encoding": {
    "x": {"field": "Subcategory"}
  }
}

distribution plot for different categories

How do I create the corresponding picture for different quantiles, such as the 5 %, 25 %, median, 75 %, 95 % quantiles, instead of those usually used for the boxplot quantiles, as rendered by this code? I am not sure how to change the above code, appropriately.


Solution

  • Left is original and right is the new quantiles. You can hover your mouse over the bars to check values.

    enter image description here

    {
      "data": {"name": "dataset"},
      "transform": [
        {
          "quantile": "Total Sales",
          "probs": [
            0.05,
            0.25,
            0.5,
            0.75,
            0.95
          ],
          "groupby": ["Subcategory"]
        },
        {
          "calculate": "datum.prob==0.05?'a':datum.prob==0.25?'b':datum.prob==0.5?'c':datum.prob==0.75?'d':'e'",
          "as": "prob2"
        },
        {
          "pivot": "prob2",
          "groupby": ["Subcategory"],
          "value": "value"
        }
      ],
      "layer": [
        {
          "mark": {
            "type": "bar",
            "color": "red",
            "tooltip": true
          },
          "encoding": {
            "y": {
              "field": "a",
              "type": "quantitative"
            },
            "y2": {"field": "b"}
          }
        },
        {
          "mark": {
            "type": "bar",
            "color": "blue",
            "tooltip": true
          },
          "encoding": {
            "y": {
              "field": "b",
              "type": "quantitative"
            },
            "y2": {"field": "c"}
          }
        },
        {
          "mark": {
            "type": "bar",
            "color": "green",
            "tooltip": true
          },
          "encoding": {
            "y": {
              "field": "c",
              "type": "quantitative"
            },
            "y2": {"field": "d"}
          }
        },
        {
          "mark": {
            "type": "bar",
            "color": "yellow",
            "tooltip": true
          },
          "encoding": {
            "y": {
              "field": "d",
              "type": "quantitative"
            },
            "y2": {"field": "e"}
          }
        }
      ],
      "encoding": {
        "x": {"field": "Subcategory"}
      }
    }