When creating a pivot table using data.table, I am using the dcast function:
dcast(my_data, var1 ~ var2, length)
This gives a table with rows as var1 lables and column as var2 labels and value as count of cells common to particular row and column.
But instead of length I want to calculate the proportion and put it as the value, i.e. {count of cells common to particular row and column} divided by {count of all cells in the column i.e. a particular level of var2}
I have searched and couldn't able to implement it. Any help would be appreciated.
There is a relatively simple solution, but it requires a second step after the dcast()
.
First, this is the data I am working on:
library(data.table)
set.seed(666)
my_data <- data.table(var1 = sample(letters[1:3], 10, TRUE),
var2 = sample(letters[4:6], 10, TRUE))
var1 var2
1: c f
2: a d
3: c d
4: a d
5: b d
6: c f
7: c d
8: b f
9: a e
10: a e
After the dcast
my_data_dcast <- dcast(my_data, var1 ~ var2, length)
the data looks like this:
var1 d e f
1: a 2 2 0
2: b 1 0 1
3: c 2 0 2
You can then simply go through all columns and divide each element in a column by the sum of all values in a column.
Select the columns to transform:
cols <- unique(my_data$var2)
Go through columns using lapply()
on the subset of columns specified in .SDcols
and override the values of all cols
:
my_data_dcast[, (cols) := (lapply(.SD, function(col) col / sum(col))),
.SDcols = cols]
The final result is this:
var1 d e f
1: a 0.4 1 0.0000000
2: b 0.2 0 0.3333333
3: c 0.4 0 0.6666667