From the following data.table :
test <- data.table(name = c(rep(1,20), rep(2,20), rep(3,20)), type = c(rep("apple",10), rep("pear",10), rep("apple",15), rep("pear",5), rep("pear",20)))
I want to get to this result :
name diff
<num> <num>
1: 1 0
2: 2 10
3: 3 -20
Where the difference is the number of apples minus by the number of pears for each name
.
I tested something like :
results <- test[, .(diff = table(type)[1] - table(type)[2]), by = name]
But it won't work when there is only one value for a name because it will return NA
.
The issue in your code is related to how you are trying to calculate the difference between the counts of the type column for each name. Specifically, table(type) returns a table of counts for each unique type (e.g., "apple" and "pear"), but you're trying to access both values from this table directly inside the expression, which can lead to issues.
Instead, you need to calculate the counts of each type (apple and pear) separately for each name, and then subtract the counts of pear from apple (or vice versa) for each name.
Correct code :)
results <- test[, .(diff = sum(type == "apple") - sum(type == "pear")), by = name]