I have a probability data frame like below, called ptable
:
unique_id color share
1 red 0.3
1 blue 0.7
2 red 0.4
3 blue 0.5
I'd like to randomly assign a color
variable based on the share
variable in the probably table to another data frame join_table
that looks like below.
unique_id count
1 3
2 4
I understand sample() but am stuck on how to assign the probability by the shared unique_id
. My latest attempt was
join_table %>%
group_by(unqiue_id) %>%
mutate(color= sample(ptable$race[unique_id==ptable$unique_id],
size=n(),
prob=ptable$share[nique_id==ptable$unique_id],
replace=TRUE))
Any help would be great.
There were two typos in the code:
group_by(unqiue_id)
should be group_by(unique_id)
and
prob=ptable$share[nique_id==ptable$unique_id]
should be prob=ptable$share[unique_id==ptable$unique_id]
.
This should work:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
ptable <- tibble::tribble(
~unique_id, ~color, ~share,
1, "red", 0.3,
1, "blue", 0.7,
2, "red", 0.4,
3, "blue", 0.5)
join_table <- tibble::tribble(
~unique_id, ~count,
1, 3,
2, 4)
join_table %>%
group_by(unique_id) %>%
mutate(color= sample(ptable$color[unique_id==ptable$unique_id],
size=n(),
prob=ptable$share[unique_id==ptable$unique_id],
replace=TRUE))
#> # A tibble: 2 × 3
#> # Groups: unique_id [2]
#> unique_id count color
#> <dbl> <dbl> <chr>
#> 1 1 3 blue
#> 2 2 4 red
Created on 2022-03-01 by the reprex package (v2.0.1)