So my situation is the following. I have 60.000 images, and they were trained using 4 models. Each model tries to predict what is in the image, so I end up with a dataset containing where each image shows up 4 times. Now, what I'd like to group each image according to how many models were able to get it correct. In other words, one image might have been predicted correctly in all 4 models, and another incorrectly in all 4, so the first should be in category 4 and the second in category 1.
How can I do this using Vega-Lite (I know I could preprocess the data, but I'd like to do it directly with Vega-Lite). I've tried the following, but without success:
vl.markPoint()
.data(data)
.transform(
vl.groupby('image_id'),
vl.joinaggregate( [{
"op": "sum",
"field": "acc",
"as": "totalacc"}]),
vl.calculate("datum.totalacc").as('total')
)
.encode(
vl.y().sum('acc'),
vl.x().fieldQ('total'),
vl.detail().fieldQ('image_id')
).render();
vl.groupby()
by itself doesn't do anything to the data. I suspect what you probably want for your transforms is something like this:
.transform(
vl.groupby('image_id')
.joinaggregate([{
"op": "sum",
"field": "acc",
"as": "totalacc"
}]),
vl.calculate("datum.totalacc").as('total')
)