Search code examples
rsample

Sample with only 1 number


I'm trying to create some simulated data. To create clustered data, I have assigned whether prescribers work in one or more than one local health area (LHA). Now, I am trying to assign a prescriber for a patient based on their LHA. The code for that is in the following codeblock.

for (i in seq_along(data$LHA)) {
  data$prescriber_id[i] <- sample(x = number_of_LHAs_worked$prescriber_id[
    number_of_LHAs_worked$assigned_LHAs_2 == data$LHA[i]], 
                                  size = 1)
}

This loop works well for prescribers in more than one LHA (i.e. length of x given to the sample function is larger than 1. However, it fails when a prescribers works in only one LHA due to the behaviour of the sample function.

sample(x = 154, size = 1) 

When given only one number for x, R creates an index from 1 to x, and then randomly chooses a number in this range.

While I've worked out a solution for my purposes; I'm interested in seeing whether others have figured out ways to make the sample function work more consistently. Specifically, force the sample function to only use the set specified.

sample(x = 154:155, size = 1)    # here the function chooses only a number in the set {154, 155}. 

Solution

  • ?sample supplies an answer in its examples:

    set.seed(47)
    
    resample <- function(x, ...) x[sample.int(length(x), ...)]
    
    # infers 100 means 1:100
    sample(100, 1)
    #> [1] 98
    
    # stricter
    resample(100, 1)
    #> [1] 100
    
    # still works normally if explicit
    resample(1:100, 1)
    #> [1] 77