Rookie here -- I have a large data set of about 75,000 observations and 2000 unique IDs. Therefore, each ID has about 37 observations. Now, how can I take a random sample of unique IDs, say 4, such that I have a new data frame that contains 4 random unique IDs and their corresponding observations for a total of about 150 observations?
Like this:
df <- data.frame(id = gl(2000, 37), obs = runif(74000)) # Example data set
ids <- sample(levels(df$id), 4)
df.sub <- df[df$id %in% ids, ]