I have the following R data.table
:
library(data.table)
dt =
unique_point biased data_points team groupID
1: up1 FALSE 3 1 xy28352
2: up1 TRUE 4 22 xy28352
3: up2 FALSE 1 4 xy28352
4: up2 TRUE 0 3 xy28352
5: up3 FALSE 12 5 xy28352
6: up3 TRUE 35 7 xy28352
....
I've formatted the data.table such that for each unique_point
, I am measuring the data points for unbiased
and biased
. So each unique_point
has two rows, biased FALSE and biased TRUE. If there are no measurements, this is recorded as 0.
As an example, for up1
, there are 3 data points for the unbiased experiment, and 4 data points for the biased experiment.
Each groupID
has 25 teams, each with potentially with a measurement for biased
and unbiased
. I would like to re-format the data.table so it calculates the number of data points by team as well, for each unique data points (due to the data, this will make rows have data_points
of 0).
unique_point biased data_points team groupID
1: up1 FALSE 3 1 xy28352
2: up1 TRUE 0 1 xy28352
3: up1 FALSE 0 2 xy28352
4: up1 TRUE 0 2 xy28352
5: up1 FALSE 0 3 xy28352
6: up1 TRUE 0 3 xy28352
....
45. up1 TRUE 4 22 xy28352
....
49. up1 FALSE 0 25 xy28352
50. up1 TRUE 0 25 xy28352
This task is very close to somehow "unfolding" the data.table. For each unique_point
, I would create 50 rows, 25 teams with TRUE and FALSE. The added complication is that I need to use the counts
above to fill in the above with the counts.
There should be a way to use unique()
to count the times the rows exist possibly?
If I try
setkey(dt, team, unique_point)[CJ(unique(unique_point), unique(team)), .N, by=.EACHI]
I am counting the number of rows which occur for unique_point
and team
. But this wouldn't keep the data_points
.
Using:
DT2 <- DT[, .SD[CJ(team = 1:25, biased = biased, unique = TRUE), on = .(biased, team)], by = .(unique_point, groupID)
][is.na(data_points), data_points := 0][]
setcolorder(DT2, c(1,3:5,2))
gives:
> DT2 unique_point biased data_points team groupID 1: up1 FALSE 3 1 xy28352 2: up1 TRUE 0 1 xy28352 3: up1 FALSE 0 2 xy28352 4: up1 TRUE 0 2 xy28352 5: up1 FALSE 0 3 xy28352 --- 146: up3 TRUE 0 23 xy28352 147: up3 FALSE 0 24 xy28352 148: up3 TRUE 0 24 xy28352 149: up3 FALSE 0 25 xy28352 150: up3 TRUE 0 25 xy28352
What this does:
DT
by unique_point
and groupID
with by = .(unique_point, groupID)
CJ(team = 1:25, biased = biased)
) for biased
and team
.NA
-values for the rows that are not present in DT
. Therefore you fill them with zero's with the [is.na(data_points), data_points := 0]
part.[]
) aren't necessary, but makes printing on the console requiring a step less extra. For more info, see here.Using setcolorder(DT2, c(1,3:5,2))
isn't necessary & only necessary if you want to get the exact same column order as described in the question.
As an alternative, you could also use:
DT2 <- DT[CJ(unique_point = unique_point, biased = biased, team = 1:25, groupID = groupID, unique = TRUE),
on = .(unique_point, biased, team, groupID)
][is.na(data_points), data_points := 0][]
The full first 60 rows:
> DT2[1:60] unique_point biased data_points team groupID 1: up1 FALSE 3 1 xy28352 2: up1 TRUE 0 1 xy28352 3: up1 FALSE 0 2 xy28352 4: up1 TRUE 0 2 xy28352 5: up1 FALSE 0 3 xy28352 6: up1 TRUE 0 3 xy28352 7: up1 FALSE 0 4 xy28352 8: up1 TRUE 0 4 xy28352 9: up1 FALSE 0 5 xy28352 10: up1 TRUE 0 5 xy28352 11: up1 FALSE 0 6 xy28352 12: up1 TRUE 0 6 xy28352 13: up1 FALSE 0 7 xy28352 14: up1 TRUE 0 7 xy28352 15: up1 FALSE 0 8 xy28352 16: up1 TRUE 0 8 xy28352 17: up1 FALSE 0 9 xy28352 18: up1 TRUE 0 9 xy28352 19: up1 FALSE 0 10 xy28352 20: up1 TRUE 0 10 xy28352 21: up1 FALSE 0 11 xy28352 22: up1 TRUE 0 11 xy28352 23: up1 FALSE 0 12 xy28352 24: up1 TRUE 0 12 xy28352 25: up1 FALSE 0 13 xy28352 26: up1 TRUE 0 13 xy28352 27: up1 FALSE 0 14 xy28352 28: up1 TRUE 0 14 xy28352 29: up1 FALSE 0 15 xy28352 30: up1 TRUE 0 15 xy28352 31: up1 FALSE 0 16 xy28352 32: up1 TRUE 0 16 xy28352 33: up1 FALSE 0 17 xy28352 34: up1 TRUE 0 17 xy28352 35: up1 FALSE 0 18 xy28352 36: up1 TRUE 0 18 xy28352 37: up1 FALSE 0 19 xy28352 38: up1 TRUE 0 19 xy28352 39: up1 FALSE 0 20 xy28352 40: up1 TRUE 0 20 xy28352 41: up1 FALSE 0 21 xy28352 42: up1 TRUE 0 21 xy28352 43: up1 FALSE 0 22 xy28352 44: up1 TRUE 4 22 xy28352 45: up1 FALSE 0 23 xy28352 46: up1 TRUE 0 23 xy28352 47: up1 FALSE 0 24 xy28352 48: up1 TRUE 0 24 xy28352 49: up1 FALSE 0 25 xy28352 50: up1 TRUE 0 25 xy28352 51: up2 FALSE 0 1 xy28352 52: up2 TRUE 0 1 xy28352 53: up2 FALSE 0 2 xy28352 54: up2 TRUE 0 2 xy28352 55: up2 FALSE 0 3 xy28352 56: up2 TRUE 0 3 xy28352 57: up2 FALSE 1 4 xy28352 58: up2 TRUE 0 4 xy28352 59: up2 FALSE 0 5 xy28352 60: up2 TRUE 0 5 xy28352
Used data:
DT <- fread('unique_point biased data_points team groupID
up1 FALSE 3 1 xy28352
up1 TRUE 4 22 xy28352
up2 FALSE 1 4 xy28352
up2 TRUE 0 3 xy28352
up3 FALSE 12 5 xy28352
up3 TRUE 35 7 xy28352')