Search code examples
vega-lite

text-box statistics for interval selection


How do I show details about the current selection (or everything if unselected)? For example, a given field of the first or last selected value, selection extents, the number of selected values, the mean of the selected values, and so forth.

I'd like the stat text to be table-formatted, ideally below the legend or in the upper-right of the plot. Here is a mock-up:

selection details below legend

This is how I generated the data:

dt2 = datetime.datetime.now()
ydt = datetime.timedelta(days=365)
dt1 = dt2 - ydt
dt1 = dt1.timestamp()
dt2 = dt2.timestamp()
times = [random.uniform(dt1, dt2) for i in range(10**5)]
times.sort()
values = []
for t in times:
    d = {}
    t = datetime.datetime.fromtimestamp(t)
    t = t.isoformat(" ", timespec="seconds")
    v = random.uniform(0, 1000)
    c = random.randint(0,1)
    d["time"] = t
    d["data"] = v
    d["category"] = c
    values.append(d)

with open("./test.json", "wt") as f:
    json.dump(values[:10**4], f)

And here is what I've been able to achieve in the Vega Editor (I had to trim the data to fit the link).

Update:

The statistics text layer will be drawn once for each datum, so it must have exactly a single row. The data I need can be reduced by an aggregate transform into a single row, and a text expr used to format the data. Unfortunately aggregate transforms don't understand default values so no rows are produced when the brush is empty. I tried solving this with a custom JS transform, but I couldn't figure out how to load the Vega transform into Vega-Lite. A workaround would be to add an additional layer that displays the default values only when the brush is empty - empty string otherwise. That's kind of ugly.

I've also figured out how to apply all the aggregate transforms in an expr thanks to the new pluck function. Right now I'm still filtering out the data in the transform block which means the text expr is drawn a variable number of times - zero times when the brush is empty. I'm trying to decouple the data source from the layer. I've seen several examples online that place a transform in the the dataset block, or have data contain a list of data objects. I don't know if these answers correspond to older version of Vega-Lite because none of the solutions work. Right now my best workaround is a separate layer with a hidden mark, which brings me back to square one.

I think next I will try to move the filter transform into expr via the clamp function. This will allow that layer to use a data source with exactly one datum, though I am a bit worried about performance. Since expr do not allow variable assignments, I'm going to be duplicating a lot of calculations in expr just to obtain my statistics.

If this approach works I could try storing the expr results in a params block as a signal, but I've noticed some performance penalties to using params...

If that doesn't work I could create a custom JS expression function to calculate all the required statistics without any filter or aggregate transforms. The JS function would be able to cache intermediate results for performance.


Solution

  • I've decided to go the JS route. See code and attached image.

    <!doctype html>
    <html>
      <head>
        <title>Too Much Data</title>
        <meta charset="utf-8" />
    
        <script src="https://cdn.jsdelivr.net/npm/[email protected]/build/vega.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/[email protected]/build/vega-lite.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/[email protected]/build/vega-embed.js"></script>
    
        <style media="screen">
        h1 {
            text-align: center;
            font-family: Georgia, serif
        }
        #vis {
            width: 100%;
        }
        </style>
      </head>
      <body>
        <h1>Too Much Data</h1>
        <!-- Container for the visualization -->
        <div id="vis"></div>
    
        <script>
          // Assign the specification to a local variable vlSpec.
          var vlSpec =
            { $schema: "https://vega.github.io/schema/vega-lite/v5.json"
            , datasets:
              { source: [{"time": "2023-08-31 15:12:40", "data": 265.1037232391961, "category": 1}, {"time": "2023-08-31 15:15:26", "data": 989.2391954577464, "category": 1}, {"time": "2023-08-31 15:15:29", "data": 426.3748533977788, "category": 0}, {"time": "2023-08-31 15:18:11", "data": 563.7725296786067, "category": 1}, {"time": "2023-08-31 15:21:25", "data": 322.4493083566362, "category": 0}, {"time": "2023-08-31 15:24:54", "data": 822.6771646740021, "category": 1}, {"time": "2023-08-31 15:28:10", "data": 294.37404484299054, "category": 1}, {"time": "2023-08-31 15:28:22", "data": 838.7462086185608, "category": 0}, {"time": "2023-08-31 15:35:58", "data": 961.3893188770259, "category": 0}, {"time": "2023-08-31 15:36:07", "data": 241.45631836625802, "category": 0}, {"time": "2023-08-31 15:49:33", "data": 191.96326506124362, "category": 0}, {"time": "2023-08-31 15:51:15", "data": 450.50733664623965, "category": 1}, {"time": "2023-08-31 15:56:35", "data": 390.5921971631632, "category": 1}, {"time": "2023-08-31 15:57:52", "data": 829.8364876130439, "category": 1}, {"time": "2023-08-31 16:01:06", "data": 996.0349700996576, "category": 0}, {"time": "2023-08-31 16:02:36", "data": 78.24722444300802, "category": 0}, {"time": "2023-08-31 16:14:05", "data": 942.3350040849994, "category": 0}, {"time": "2023-08-31 16:14:56", "data": 860.58714895142, "category": 1}, {"time": "2023-08-31 16:15:08", "data": 515.199102407516, "category": 1}, {"time": "2023-08-31 16:23:20", "data": 166.05721829849873, "category": 1}, {"time": "2023-08-31 16:30:05", "data": 439.73137493646266, "category": 0}, {"time": "2023-08-31 16:32:31", "data": 869.245076742056, "category": 0}, {"time": "2023-08-31 16:35:48", "data": 480.50968063008304, "category": 1}, {"time": "2023-08-31 16:37:50", "data": 476.877035209344, "category": 1}, {"time": "2023-08-31 16:39:36", "data": 733.3017448826324, "category": 0}, {"time": "2023-08-31 16:44:17", "data": 636.686519092496, "category": 1}, {"time": "2023-08-31 16:45:08", "data": 694.5261775005811, "category": 0}, {"time": "2023-08-31 16:51:36", "data": 695.7401884245502, "category": 0}, {"time": "2023-08-31 16:55:29", "data": 570.0935946720598, "category": 1}, {"time": "2023-08-31 16:57:05", "data": 277.22052647262717, "category": 0}, {"time": "2023-08-31 16:58:27", "data": 480.36926264607274, "category": 1}, {"time": "2023-08-31 17:02:34", "data": 893.3698570026319, "category": 1}, {"time": "2023-08-31 17:05:32", "data": 236.71895124154685, "category": 1}, {"time": "2023-08-31 17:08:46", "data": 573.0841835923452, "category": 0}, {"time": "2023-08-31 17:14:37", "data": 191.7254918774728, "category": 0}, {"time": "2023-08-31 17:16:43", "data": 94.93763899240804, "category": 1}, {"time": "2023-08-31 17:24:40", "data": 936.4038465823089, "category": 0}, {"time": "2023-08-31 17:31:09", "data": 390.84825100994567, "category": 1}, {"time": "2023-08-31 17:35:35", "data": 14.48187309843274, "category": 0}, {"time": "2023-08-31 17:35:47", "data": 443.05398617944593, "category": 1}, {"time": "2023-08-31 17:40:44", "data": 30.0828399028229, "category": 0}, {"time": "2023-08-31 17:48:33", "data": 768.0549896500464, "category": 1}, {"time": "2023-08-31 17:53:29", "data": 71.57068127924227, "category": 0}, {"time": "2023-08-31 18:04:56", "data": 594.7138236213322, "category": 0}, {"time": "2023-08-31 18:06:44", "data": 29.21370270526036, "category": 0}, {"time": "2023-08-31 18:28:22", "data": 852.7093808483378, "category": 1}, {"time": "2023-08-31 18:30:01", "data": 576.9728506525139, "category": 1}, {"time": "2023-08-31 18:31:41", "data": 968.1882202042807, "category": 1}, {"time": "2023-08-31 18:31:51", "data": 185.6873327854428, "category": 1}, {"time": "2023-08-31 18:33:31", "data": 258.211113709635, "category": 0}, {"time": "2023-08-31 18:36:36", "data": 641.264570256715, "category": 1}, {"time": "2023-08-31 18:39:52", "data": 717.6143367808544, "category": 1}, {"time": "2023-08-31 18:39:52", "data": 191.4611806426172, "category": 1}, {"time": "2023-08-31 18:41:38", "data": 136.9116350629923, "category": 0}, {"time": "2023-08-31 18:57:48", "data": 62.11343548023751, "category": 1}, {"time": "2023-08-31 18:58:26", "data": 529.5089127094398, "category": 0}, {"time": "2023-08-31 19:07:54", "data": 153.13269404700824, "category": 1}, {"time": "2023-08-31 19:09:17", "data": 705.4049459845114, "category": 0}, {"time": "2023-08-31 19:11:07", "data": 300.90132125121005, "category": 1}, {"time": "2023-08-31 19:20:25", "data": 946.4725291504993, "category": 1}, {"time": "2023-08-31 19:23:48", "data": 319.04133613813724, "category": 1}, {"time": "2023-08-31 19:24:25", "data": 464.2923297748929, "category": 1}, {"time": "2023-08-31 19:28:02", "data": 836.436678063193, "category": 1}, {"time": "2023-08-31 19:28:08", "data": 5.992853044164859, "category": 1}, {"time": "2023-08-31 19:40:01", "data": 873.6847072580948, "category": 1}, {"time": "2023-08-31 19:43:41", "data": 431.0286183407737, "category": 1}, {"time": "2023-08-31 19:51:22", "data": 396.43260404732825, "category": 0}, {"time": "2023-08-31 19:54:08", "data": 575.9715221353141, "category": 0}, {"time": "2023-08-31 19:55:53", "data": 44.016217670442614, "category": 0}, {"time": "2023-08-31 19:58:14", "data": 988.9639046666363, "category": 1}, {"time": "2023-08-31 20:05:53", "data": 742.2798696276691, "category": 0}, {"time": "2023-08-31 20:07:13", "data": 982.7119961613008, "category": 0}, {"time": "2023-08-31 20:15:20", "data": 976.3381077100345, "category": 1}, {"time": "2023-08-31 20:20:15", "data": 498.5276910780252, "category": 0}, {"time": "2023-08-31 20:22:29", "data": 301.4863894174468, "category": 1}, {"time": "2023-08-31 20:31:03", "data": 232.56452406666895, "category": 1}, {"time": "2023-08-31 20:33:52", "data": 694.171014904713, "category": 1}, {"time": "2023-08-31 20:35:45", "data": 102.79567934930212, "category": 1}, {"time": "2023-08-31 20:47:32", "data": 431.64822699883376, "category": 1}, {"time": "2023-08-31 20:55:19", "data": 683.217576875891, "category": 0}, {"time": "2023-08-31 20:55:36", "data": 879.5945045918183, "category": 1}, {"time": "2023-08-31 21:04:28", "data": 164.6834561802648, "category": 1}, {"time": "2023-08-31 21:06:04", "data": 22.588620229922583, "category": 1}, {"time": "2023-08-31 21:07:10", "data": 757.0796861192514, "category": 1}, {"time": "2023-08-31 21:23:43", "data": 848.456892794343, "category": 1}, {"time": "2023-08-31 21:34:38", "data": 447.89147371830785, "category": 1}, {"time": "2023-08-31 21:45:30", "data": 862.3116375036777, "category": 1}, {"time": "2023-08-31 21:47:00", "data": 967.0312319533795, "category": 0}, {"time": "2023-08-31 21:47:56", "data": 966.4938018703745, "category": 1}, {"time": "2023-08-31 21:49:45", "data": 890.2567189914545, "category": 0}, {"time": "2023-08-31 21:55:40", "data": 362.80312104639677, "category": 1}, {"time": "2023-08-31 21:58:55", "data": 834.7469369912607, "category": 1}, {"time": "2023-08-31 22:01:02", "data": 584.1447613550432, "category": 1}, {"time": "2023-08-31 22:01:06", "data": 82.66592460479994, "category": 1}, {"time": "2023-08-31 22:02:00", "data": 332.67959271479384, "category": 0}, {"time": "2023-08-31 22:02:32", "data": 316.51081491367347, "category": 0}, {"time": "2023-08-31 22:08:31", "data": 336.10098602094985, "category": 1}, {"time": "2023-08-31 22:18:52", "data": 873.7313013506864, "category": 0}, {"time": "2023-08-31 22:19:24", "data": 312.42947148514776, "category": 1}, {"time": "2023-08-31 22:28:48", "data": 582.8096654568776, "category": 1}]
              }
            , data: {name: "source"}
            , transform:
              [ {filter: "datum.data > 0"}
              ]
            , title: "Too Much Data"
            , config: { font: "monospace" }
            , width: "container"
            , layer:
              [ { params:
                  [ { name: "grid"
                    , bind: "scales"
                    , select:
                      { type: "interval"
                      , encodings: ["x"]
                      , on: "[mousedown[!event.shiftKey], mouseup] > mousemove"
                      , translate: "[mousedown[!event.shiftKey], mouseup] > mousemove!"
                      }
                    }
                  , { name: "brush"
                    , select:
                      { type: "interval"
                      , encodings: ["x"]
                      , on: "[mousedown[event.shiftKey], mouseup] > mousemove"
                      , translate: "[mousedown[event.shiftKey], mouseup] > mousemove!"
                      }
                    }
                  ]
                , mark: "point"
                , encoding:
                  { x: {field: "time", type: "temporal"}
                  , y: {field: "data", type: "quantitative"}
                  , color: {field: "category", type: "nominal"}
                  }
                }
              , { data: {values: [{}]}
                , mark:
                  { type: "rect"
                  , fillOpacity: 0.5
                  , stroke: "darkgrey"
                  , strokeWidth: 2
                  , fill: "white"
                  }
                , encoding:
                  { x: {value: 10}
                  , y: {value: 10}
                  , x2: {value: 186}
                  , y2: {value: 108}
                  }
                }
              , { data: {values: [{}]}
                , mark:
                  { type: "text"
                  , align: "left"
                  , baseline: "top"
                  , text: {expr: "statistics(grid_time, brush_time, data('data_0'))"}
                  }
                  , encoding:
                    { x: {value: 15}
                    , y: {value: 15}
                    }
                }
              ]
            }
    
          function bisect_left(xs, x, key=i=>i) {
            let lo=0;
            let hi=xs.length;
            let mid;
            while (lo < hi) {
              mid = Math.floor((lo + hi) / 2)
                if (x <= key(xs[mid]))
                  hi = mid
                else
                  lo = mid + 1
            }
            return lo
          }
    
          function bisect_right(xs, x, key=i=>i) {
            let lo=0;
            let hi=xs.length;
            let mid;
            while (lo < hi) {
              mid = Math.floor((lo + hi) / 2)
                if (x >= key(xs[mid]))
                  lo = mid + 1
                else
                  hi = mid
            }
            return lo
          }
    
          function statistics(grid_time, brush_time, data) {
            let results =
                { ...statistics_grid(grid_time, data)
                , ...statistics_brush(brush_time)
                , ...statistics_brush_data(brush_time, data)
                }
              , delta = results.delta ? " over " + results.delta : ""
              ;
            return (
              [ "view_l " + results.view[0]
              , "view_r " + results.view[1]
              , "sel_l  " + results.selection[0]
              , "sel_r  " + results.selection[1]
              , "dat_l  " + results.data[0]
              , "dat_r  " + results.data[1]
              , "rate   " + results.count + delta
              ])
          }
    
          function statistics_grid(grid_time, data) {
            const fmt = i => datefmt(new Date(i));
            if (!grid_time)
              grid_time = [data[0].time, data[data.length-1].time]
            return {view: grid_time.map(fmt)}
          }
    
          function statistics_brush(brush_time) {
            let results = {selection: ["N/A", "N/A"], delta: ""};
            if (!brush_time)
              return results
            delta = deltafmt((brush_time[1] - brush_time[0]) / 1000)
            results.selection = brush_time.map(datefmt)
            results.delta = delta
            return results
          }
    
          function statistics_brush_data(brush_time, data) {
            let results = {data: ["N/A", "N/A"], count: 0};
            const fmt = i => datefmt(new Date(data[i].time));
            if (!brush_time)
              return results
            indexes = islice(...brush_time, data, i=>i.time)
            // no overlap
            if (indexes[0] > data.length || indexes[1] < 0)
              return results
            // no selection
            if (indexes[0] > indexes[1])
              return results
            results.data = indexes.map(fmt)
            results.count = indexes[1] - indexes[0] + 1
            return results
          }
    
          function islice(a, b, data, key) {
            ai = bisect_left(data, a, key=key)
            bi = bisect_right(data, b, key=i=>i.time) - 1
            return [ai, bi]
          }
    
          function datefmt(d) {
            if (!(d instanceof Date))
              return d
            const pad = d => (d.toString().padStart(2, "0"));
            let year = d.getFullYear();
            let month = pad(d.getMonth());
            let day = pad(d.getDate());
            let hours = pad(d.getHours());
            let minutes = pad(d.getMinutes());
            let seconds = pad(d.getSeconds());
            return (year + "-" + month + "-" + day + " " +
              hours + ":" + minutes + ":" + seconds)
          }
    
          function deltafmt(s) {
            let seconds, minutes, hours, r;
            [s, seconds] = divmod(s, 60);
            [s, minutes] = divmod(s, 60);
            [s, hours] = divmod(s, 24);
            seconds = seconds.toString()
            minutes = minutes.toString().padStart(2, "0")
            hours = hours.toString().padStart(2, "0")
            r = `${hours}:${minutes}:${seconds}`
            if (s)
              r = `${s}d ${r}`
            return r
          }
    
          function divmod(x, y) {
            let r = x % y;
            let q = (x - r) / y;
            return [q, r].map(Math.trunc)
          }
    
          vega.expressionFunction("statistics", statistics)
    
          // Embed the visualization in the container with id `vis`
          vegaEmbed('#vis', vlSpec).then(function(result) {
          // Access the Vega view instance as result.view
          // (https://vega.github.io/vega/docs/api/view/)
          }).catch(console.error);
        </script>
      </body>
    </html>
    

    overlay statistics