When dcast
ing a data.table, if an id falls under multiple categories / fields, it returns the length
of values by default.
dta <- (data.table(ID = c("A", "C", "B", "A", "D", "D", "A", "B", "D", "D"),
CATEGORY_DUPLICATE = c(LETTERS[17:23], LETTERS[20:22]),
CATEGORY_UNIQUE = c(LETTERS[17:23], c("T", "U.1", "V.1")),
VALUE = c(400, 400, 500, 300, 500, 100, 200, 300, 400, 500)))
dcast(dta, ID ~ CATEGORY_DUPLICATE, value.var = "VALUE")
ID Q R S T U V W
A 1 0 0 1 0 0 1
B 0 0 1 1 0 0 0
C 0 1 0 0 0 0 0
D 0 0 0 0 2 2 0
If one id corresponds to no more than one category, only then it returns the actual value itself. I have appended a 1 to U
and V
column in the sample table to demonstrate this.
dcast(dta, ID ~ CATEGORY_UNIQUE, value.var = "VALUE")
ID Q R S T U U.1 V V.1 W
A 400 NA NA 300 NA NA NA NA 200
B NA NA 500 300 NA NA NA NA NA
C NA 400 NA NA NA NA NA NA NA
D NA NA NA NA 500 400 100 500 NA
I am looking to achieve this result within the dcast
query itself - such that multiple groups for an id generates multiple columns with actual values in them - instead of the length
of category for the id.
Please help
You can also do:
dcast(dta[, CATEGORY_UNIQUE := make.unique(CATEGORY_DUPLICATE), by = ID],
ID ~ CATEGORY_UNIQUE, value.var = "VALUE")
ID Q R S T U U.1 V V.1 W
1: A 400 NA NA 300 NA NA NA NA 200
2: B NA NA 500 300 NA NA NA NA NA
3: C NA 400 NA NA NA NA NA NA NA
4: D NA NA NA NA 500 400 100 500 NA