Search code examples
rvegan

sampling randomly with conditions


My problem sits inside a loop, I have a large dataset (DF), a subset of which looks like this:

ID     Site Species
101     4   x
101     4   y
101     4   z
102     6   x
102     6   z
102     6   a
102     6   b
103     6   a
103     6   z
103     6   c
103     6   x
103     6   y
105     6   x
105     6   y
105     6   a
105     6   z
108     1   x
108     1   a
108     1   c
108     1   z

I would like to randomly select, using each iteration of my loop (so, i) all rows of an individual ID from each Site. But crucially, only one ID from each Site. I have a separate function that subsets my large dataset for the number of Sites, so if i=1 then only one of the above Sites (for example) would be present in the subset.

If i=3, as for this posted example, then I would want all rows of 101, and either all rows of 102, 103 or 105, and all of 108.

I think something like ddply() with sample() should do it but I cannot get it to happen randomly.

Any suggestions would be greatly appreciated. thanks

James


Solution

  • How about this? I've added a function to simulate what I think your data looks like.

    #dependencies
    require(plyr)
    
    #function to make data (just to work with)
    make_data<-function(id){
      set.seed(id)
      num_sites<-round(runif(1)*3,0)+1
      num_sp<-round(runif(1)*7,0)+1
      sites<-sample(1:10,num_sites,FALSE)
      ldply(sites,function(x)data.frame(sites=x,sp=sample(letters[1:26],num_sp,FALSE)))
    }
    
    #make a data frame for example use (as per question)
    ids<-100:200
    df<-ldply(ids,function(x)data.frame(id=x,make_data(x)))
    
    ################################################
    # HERE'S THE CODE FOR THE ANSWER               #
    # use ddply to summarise by site & sampled ids #
    filter<-ddply(df,.(sites),summarise,set=sample(id,1))
    # then apply this filter to the original list
    ddply(filter,.(sites),.fun=function(x){return(df[df$site==x$sites & df$id==x$set,])})