I have created a table called data. This table contains a non-unique ID field.
data <- data.frame(ID = sample(c(1:5), 10, replace = T))
I have another table called probabilities, which contains matches for the ID field, corresponding ratios and names:
probabilities <- data.frame(ID = c(1,1,2,2,3,3,4,4,4,5), ratio = c(0.9, 0.1, 0.4, 0.6, 0.8, 0.2, 0.3, 0.3, 0.4, 1.0), name = c("A", "B", "A", "C", "F", "G", "B", "C", "G", "F"))
I am trying to create a new variable called name in the data table. This will be populated with the name variable from the probabilities table based on the ratio column.
For example, any ID of 1 in the data table should have a 90% chance of being A, and 10% chance of being B. An ID of 4 should have a 30% change of being B, a 30% chance of being C and a 40% chance of being G, and so on.
Does anyone know how this can be achieved?
I have tried the below but am getting an error:
#load packages
library(dplyr)
#create new variable called name
data <- data %>%
mutate(name = sample(probabilities$name[ID=probabilities$ID],
size = n(),
prop = probabilities$ratio[ID=probabilities$ID],
replace = TRUE))
Error in mutate()
:
! Problem while computing name = sample(...)
.
Caused by error in sample()
:
! unused argument (prop = probabilities$ratio[name = probabilities$name])
base R solution, using sapply() and sample():
data$name <- sapply( data$ID, function(ID) sample(x = probabilities[probabilities$ID==ID,"name"],prob = probabilities[probabilities$ID==ID,"ratio"],size = 1))