Search code examples
rrandomsampling

randomly sample grid without replacement


I wanted to randomly sample a data set without replacement and thought it would be easy. Unfortunately for me it was not and I could not locate R code on the internet to do it. Eventually I got this code to work. It seems overly complex, but it does seem to work.

set.seed(1234)

n.samples <- 10

my.grid <- read.table(text = '
state county y2000 y2001 y2002 y2003 y2004 y2005 y2006
  A      A      5    10    15    20    25    30    35
  A      B     15    20    25    30    35    40    45
  A      C     45    40    35    30    25    20    15
  A      Q      1     2     3     4     5     6     7
  B      A      9     8     7     6     5     4     3 
  B      B     90    91    92    93    94    95    96
  B      G     10    20    30    40    50    60    70
  B      H    100   200   300   400   500   600   700
  C      J    900   850   800   750   700   650   600
  C      K      2     4     6     8    10    12    14
  C      M      3     6     9    12    15    18    21
  C      P     50    45    40    35    30    25    20
', header = TRUE)

my.grid

population <- expand.grid(row  = c(seq(1,nrow(my.grid))), 
                          col  = c(seq(3,ncol(my.grid))))

rows <- seq(1, nrow(population))

sample <- sample(rows, n.samples, replace=FALSE)

use.these <- population[sample,]
use.these

measurement <- rep(NA, nrow(use.these))
my.area     <- my.grid[use.these[,1], c(1:2)]
my.year     <- names(my.grid)[use.these[,2]]

for(i in 1:nrow(use.these)) {

   measurement[i] <- my.grid[use.these[i,1], use.these[i,2]]

}

my.samples <- data.frame(use.these, my.area, my.year, measurement)
my.samples

Output for my.samples:

   row col state county my.year measurement
10  10   3     C      K   y2000           2
52   4   7     A      Q   y2004           5
50   2   7     A      B   y2004          35
51   3   7     A      C   y2004          25
69   9   8     C      J   y2005         650
81   9   9     C      J   y2006         600
1    1   3     A      A   y2000           5
18   6   4     B      B   y2001          91
79   7   9     B      G   y2006          70
39   3   6     A      C   y2003          30

Is there a better way, particularly in base? I have heard of the sampling package. Since my code seems to work and I am only asking for possible better approaches perhaps I should not post this here, although it seems like a common and important topic. If this is not an appropriate post I can remove it and place the code on my Wikipedia users page. Thank you for any suggestions.


Solution

  • When I attempted to use DWin's answer I realized that it returned the correct measurement, but it did not seem to return the correct state, county or year. I modified DWin's code as follows and it seems to return the same answers as the code in my original post. I debated with myself over whether to write a comment or post a second answer. I can delete this answer if others deem it appropriate after DWin reviews it.

    set.seed(1234)
    
    n.samples <- 10
    
    my.grid <- read.table(text = '
    state county y2000 y2001 y2002 y2003 y2004 y2005 y2006
      A      A      5    10    15    20    25    30    35
      A      B     15    20    25    30    35    40    45
      A      C     45    40    35    30    25    20    15
      A      Q      1     2     3     4     5     6     7
      B      A      9     8     7     6     5     4     3 
      B      B     90    91    92    93    94    95    96
      B      G     10    20    30    40    50    60    70
      B      H    100   200   300   400   500   600   700
      C      J    900   850   800   750   700   650   600
      C      K      2     4     6     8    10    12    14
      C      M      3     6     9    12    15    18    21
      C      P     50    45    40    35    30    25    20
    ', header = TRUE)
    
    my.grid
    
    mat <- as.matrix(my.grid[,3:ncol(my.grid)])
    mat
    
    size <- length(mat)
    
    picks <- sample(size, n.samples)
    picks
    
    # [1] 10 52 50 51 69 81  1 18 79 39
    
    my.column <- 2 + (1 + (picks %/% (nrow(my.grid))))
    my.row    <- picks - (picks %/% nrow(mat)) * nrow(mat)
    
    my.samples2 <- cbind(my.row, my.column, my.grid[my.row, 1:2], names(my.grid)[my.column], mat[picks])
    names(my.samples2) <- c('row','column','state','county','year','measurement')
    my.samples2
    

    Gives:

        row column state county  year measurement
    10   10      3     C      K y2000           2
    4     4      7     A      Q y2004           5
    2     2      7     A      B y2004          35
    3     3      7     A      C y2004          25
    9     9      8     C      J y2005         650
    9.1   9      9     C      J y2006         600
    1     1      3     A      A y2000           5
    6     6      4     B      B y2001          91
    7     7      9     B      G y2006          70
    3.1   3      6     A      C y2003          30