Search code examples
rmatrixsamplingdownsampling

downsampling a matrix in R


I have a matrix Q that is relatively high dimensional (100X500000), and I want to downsample it. By downsample, I will explain with an example.

Let Q =

1 4 9
3 2 1

and downsample size= n. I want to draw n balls from a jar of sum(Q) = 20 balls, each ball colored 1 of 6 ways corresponding to a different index pair of the matrix. It's like I have 1 ball of color A, 4 balls of color B, etc, and I'm drawing n balls without replacement.

I want it to be returned in the same format, as a matrix. One example return value, for example, downsample(Q, 3) =

0 0 2
1 0 0

My approach is trying to use sample:

sample(length(as.vector(Q)), size=n, replace=FALSE, prob = as.vector(Q))

However the problem with this is, sample considers 1:length(as.vector(Q)) as all the balls I have, so I can't draw more than length(as.vector(Q)) balls since I'm not replacing my balls.

So then to adapt my method, I would need to update my prob by subtracting 1 from this vector, and call sample one by one using a for loop of some sort. It doesn't sound like nice code.

Is there a better way to do this in a R-friendly, no for loop way?


Solution

  • It's a little inefficient, but if sum(Q) isn't too large you can do this by disaggregating/replicating the vector and then sampling, then reaggregating/tabulating.

    Q <- setNames(c(1,4,9,3,2,1),LETTERS[1:6])
    n <- 10
    set.seed(101)
    s0 <- sample(rep(names(Q),Q),
           size=n,replace=FALSE)
    Q2 <- table(factor(s0,levels=names(Q)))
    ## A B C D E F 
    ## 1 2 5 1 0 1 
    

    I'm not sure about your matrix structure. You could use dim(Q2) <- dim(Q) to reorganize the results in the same order as your original matrix ...