I have a matrix Q that is relatively high dimensional (100X500000), and I want to downsample it. By downsample, I will explain with an example.
Let Q =
1 4 9
3 2 1
and downsample size= n. I want to draw n balls from a jar of sum(Q) = 20 balls, each ball colored 1 of 6 ways corresponding to a different index pair of the matrix. It's like I have 1 ball of color A, 4 balls of color B, etc, and I'm drawing n balls without replacement.
I want it to be returned in the same format, as a matrix. One example return value, for example, downsample(Q, 3) =
0 0 2
1 0 0
My approach is trying to use sample:
sample(length(as.vector(Q)), size=n, replace=FALSE, prob = as.vector(Q))
However the problem with this is, sample considers 1:length(as.vector(Q)) as all the balls I have, so I can't draw more than length(as.vector(Q)) balls since I'm not replacing my balls.
So then to adapt my method, I would need to update my prob by subtracting 1 from this vector, and call sample one by one using a for loop of some sort. It doesn't sound like nice code.
Is there a better way to do this in a R-friendly, no for loop way?
It's a little inefficient, but if sum(Q)
isn't too large you can do this by disaggregating/replicating the vector and then sampling, then reaggregating/tabulating.
Q <- setNames(c(1,4,9,3,2,1),LETTERS[1:6])
n <- 10
set.seed(101)
s0 <- sample(rep(names(Q),Q),
size=n,replace=FALSE)
Q2 <- table(factor(s0,levels=names(Q)))
## A B C D E F
## 1 2 5 1 0 1
I'm not sure about your matrix structure. You could use dim(Q2) <- dim(Q)
to reorganize the results in the same order as your original matrix ...