Search code examples
rrandomsampling

Random sampling over XY coordinates in R (or in Matlab ??)


My data frame has the following four columns: type("A" or "B"), xvar, longitude, and latitude. It looks like:

      type    xvar    longitude    latitude
[1,]   A       20      -87.81        40.11
[2,]   A       12      -87.82        40.12
[3,]   A       50      -87.85        40.22
....
[21,]  B       24      -87.79        40.04
[22,]  B       30      -87.88        40.10
[23,]  B       12      -87.67        40.32
[24,]  B       66      -87.66        40.44
....

I have 20 rows for type="A", and 25,000 rows for type="B". My task is to randomly assign the values of xvar for 20 "A" data points onto the X-Y space of type "B" without replacement. For example, the xvar=20 as in the first observation of type="A" can be randomly located in [22,] that is (-87.88,40.10) . Because I am doing that without replacement, in theory, I can do this replication 25,000/20 = 1,250 times. I want a 1,000 replication.

And I have a function (say, myfunc(xvar,longitude,latitude)) that returns one statistical value from one randome sample. I first create an empty matrix (say, myresult) of 1,000x1.

myresult <- array(0,dim=c(1000,1))

Then, for each random sample, I apply my function (myfunc) to calculate the statistic.

for (i in seq(1:1000)) {
  draw one sample, that has three variables: xvar, longitude, latitude.
  apply my function to this selected sample.
  store the calculated statistic in the myresult[i,]
}

I wonder how to do this in R. (And may be in Matlab??) Thanks!

=============================================================

Update: @user. Borrowing your idea, the following is what I want:

dd1 <- df[df$type == "B" ,] 
dd2 <- df[df$type == "A" ,]
v   <- dd2[sample(nrow(dd2), nrow(dd2)), ]
randomXvarOfA <- as.matrix(v[,c("xvar")])  
cols <- c("longitude","latitude")
B_shuffled_XY <- dd1[,cols][sample(nrow(dd1), nrow(dd2)), ]
dimnames(randomXvarOfA)=list(NULL,c("xvar"))
sampledData <- cbind(randomXvarOfA,B_shuffled_XY)
sampledData

   xvar longitude latitude
4   20    -87.79    40.04
7   12    -87.66    40.44
5   50    -87.88    40.10

Solution

  • Read in your data:

      df<- read.table( text="
          type    xvar    longitude    latitude
          A       20      -87.81        40.11
          A       12      -87.82        40.12
          A       50      -87.85        40.22
          B       24      -87.79        40.04
          B       30      -87.88        40.10
           B       12      -87.67        40.32
          B       66      -87.66        40.44", header = TRUE)
    

    I was writing this without splitting and it looked so messy. So I decided just to split your data.frame.

        dd1 <- df[df$type == "B" ,]  # get all rows of just type A
        dd2 <- df[df$type == "A" ,]  # get all rows of just type B
    
        v   <- dd2[sample(nrow(dd2), 2), ] #sample two rows at random that are type A
        # if you want to sample 20 rows change the 2 to a 20
    
        cols <- c("longitude", "latitude")
        dd1[,cols][sample(nrow(dd1), 2), ] <- v[,cols] 
        #Add the random long/lat selected from type As into 2 random long/lat of B
    
    
    # put the As and Bs back together
    rbind(dd2,dd1)
    #  type xvar longitude latitude
    # 1    A   20    -87.81    40.11
    # 2    A   12    -87.82    40.12
    # 3    A   50    -87.85    40.22
    # 4    B   24    -87.79    40.04
    # 5    B   30    -87.85    40.22
    # 6    B   12    -87.81    40.11
    # 7    B   66    -87.66    40.44
    

    As you can see rows 5 and 6 of B have new randomly selected lat and long values from A types. I did not change the xvar values though. I don't know if you want this. If you did want to change the xvars too then you would change cols to cols <- c("xvar","longitude", "latitude").

    Inside a function it would look like:

    changestuff <-  function(x){
    
            dd1 <- x[x$type == "B" ,]  # get just A
            dd2 <- x[x$type == "A" ,]  # get just B
            v   <- dd2[sample(nrow(dd2), 2), ]
            cols <- c("longitude", "latitude")
            dd1[,cols][sample(nrow(dd1), 2), ] <- v[,cols] 
            rbind(dd2,dd1)
                                }
    
    changestuff(df)