I'm trying to create some simulated data. To create clustered data, I have assigned whether prescribers work in one or more than one local health area (LHA). Now, I am trying to assign a prescriber for a patient based on their LHA. The code for that is in the following codeblock.
for (i in seq_along(data$LHA)) {
data$prescriber_id[i] <- sample(x = number_of_LHAs_worked$prescriber_id[
number_of_LHAs_worked$assigned_LHAs_2 == data$LHA[i]],
size = 1)
}
This loop works well for prescribers in more than one LHA (i.e. length of x given to the sample function is larger than 1. However, it fails when a prescribers works in only one LHA due to the behaviour of the sample function.
sample(x = 154, size = 1)
When given only one number for x, R creates an index from 1 to x, and then randomly chooses a number in this range.
While I've worked out a solution for my purposes; I'm interested in seeing whether others have figured out ways to make the sample function work more consistently. Specifically, force the sample function to only use the set specified.
sample(x = 154:155, size = 1) # here the function chooses only a number in the set {154, 155}.
?sample
supplies an answer in its examples:
set.seed(47)
resample <- function(x, ...) x[sample.int(length(x), ...)]
# infers 100 means 1:100
sample(100, 1)
#> [1] 98
# stricter
resample(100, 1)
#> [1] 100
# still works normally if explicit
resample(1:100, 1)
#> [1] 77