I have a dataframe with 811777 rows and 133 different worker IDs. My dataframe looks like this:
PERS_ID NEU_DATUM
1 22 2022-03-01 00:00:00
2 22 2022-03-01 00:00:00
3 22 2022-03-01 00:00:00
4 22 2022-03-01 00:00:00
5 22 2022-03-01 00:00:00
6 22 2022-03-01 00:00:00
7 22 2022-03-01 00:00:00
8 22 2022-03-01 00:00:00
9 22 2022-03-01 00:00:00
10 22 2022-03-01 00:00:00
In the first 10 rows u can only see one worker with the ID "22", but like I said above my df has 133 different worker IDs. I want to take 50 random worker IDs and create a new df. But I don´t want one row for one ID. Instead I want every row that has that worker ID. So basically my new df should consist of 50 random worker IDs and I want every row of these workers. I already tried with the sample code but I failed :(. Thanks in advance!
If your data are df
, you can do the following:
df[df$PERS_ID %in% sample(unique(df$PERS_ID), 50),]
or with data.table
library(data.table)
setDT(df)[PERS_ID %in% sample(unique(PERS_ID),50)]
or with dplyr:
library(dplyr)
df %>% filter(PERS_ID %in% sample(unique(PERS_ID),50))
You can also do this using a join approach; one such approach using dplyr
is shown below:
inner_join(
df,
df %>% distinct(PERS_ID) %>% slice_sample(n=50)
)