I have a data frame made by almost 50,000 rows spread in 15 different IDs (every ID has thousands of observations). Data frame looks like:
ID Year Temp ph
1 P1 1996 11.3 6.80
2 P1 1996 9.7 6.90
3 P1 1997 9.8 7.10
...
2000 P2 1997 10.5 6.90
2001 P2 1997 9.9 7.00
2002 P2 1997 10.0 6.93
I want to take 500 random rows for every ID (so 500 for P1, 500 for P2,....) and create a new df. I try:
new_df<-df[df$ID %in% sample(unique(dfID),500),]
But it takes randomly one ID, while I need 500 random rows for every ID.
Try this:
library(plyr)
ddply(df,.(ID),function(x) x[sample(nrow(x),500),])