Suppose I am allowed to distribute 100% of the weight along an 5-length vector. However, I can't put weights into two adjacent values and no value can be more than 50%.
For example,
[0, .5, 0, 0, .5] is good
[.5, .5, 0, 0,0] is not good
[.2, 0, .2, 0, .6] is good
[.2, 0, .2, .2, .2] is not good
I'd like to generate say 10,000 such vectors from which to run a monte carlo simulation.
I'm thinking I can do this with expand.grid
but I'm not quite sure how.
I can generate a random one and then:
nonzero_weights = which(starting_weights>0)
grid_positions = expand.grid(startingPos = nonzero_weights, endingPos = nonzero_weights)
And then do some filtering and dropping but that seems messy. Why generate if I don't need them. Is there a cleaner way to do this?
If we did not have the adjacency restriction, this problem would not be that difficult with the tools currenty available in R
(see this answer more info). With the adjacency restriction, we have to do a little more work to get our desired result.
We first note that since we cannot have 2 consecutive numbers in a row of a vector with n columns (the OP clarified in the comments that they need n = 11 so we will use this as our test case), that the maximum number of columns with a value is equal to 11 - floor(11 / 2) = 6
. This occurs when the values are present in the columns 1 3 5 7 9 11
. We should also note that since the maximum value is capped at 0.5 and we need the row to sum to 1, that the minimum number of columns with a value is equal to 2 since ceiling(1 / 0.5) = 2
. With this information, we can begin our attack.
We first generate every combination of 11 choose 2 through 6. We then sift out combinations that violate the adjacency restriction. The latter part can easily be achieved by taking the diff
of every row and checking if any of the resulting differences is equal to 1. Observe (N.B. we use RcppAlgos
(I am the author) for all computations):
library(RcppAlgos)
vecLen <- 11L
lowComb <- as.integer(ceiling(1 / 0.5))
highComb <- 6L
numCombs <- length(lowComb:highComb)
allCombs <- lapply(lowComb:highComb, function(x) {
comboGeneral(vecLen, x)
})
validCombs <- lapply(allCombs, function(x) {
which(apply(x, 1, function(y) {
!any(diff(y) == 1L)
}))
})
combLen <- lengths(validCombs)
combLen
[1] 45 84 70 21 1
## subset each matrix of combinations using the
## vector of validCombs obtained above
myCombs <- lapply(seq_along(allCombs), function(x) {
allCombs[[x]][validCombs[[x]], ]
})
We now need to find all combinations of seq(0.05, 0.5, 0.05)
that sum to 1 for every possible length calculated above. Using the restraint features of comboGeneral
, this is an easy task:
combSumOne <- lapply(lowComb:highComb, function(x) {
comboGeneral(seq(5L,50L,5L), x, TRUE,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = 100L) / 100
})
groupLen <- sapply(combSumOne, nrow)
groupLen
1 13 41 66 78
Now, we create a matrix with our desired number of columns and fill it will all possible combinations, using myCombs
above to ensure the adjancency requirement is met.
myCombMat <- matrix(0L, nrow = sum(groupLen * combLen), ncol = vecLen)
s <- g <- 1L
e <- combRow <- nrow(combSumOne[[1L]])
for (a in myCombs[-numCombs]) {
for (i in 1:nrow(a)) {
myCombMat[s:e, a[i, ]] <- combSumOne[[g]]
s <- e + 1L
e <- e + combRow
}
e <- e - combRow
g <- g + 1L
combRow <- nrow(combSumOne[[g]])
e <- e + combRow
}
## the last element in myCombs is simply a
## vector, thus nrow would return NULL
myCombMat[s:e, myCombs[[numCombs]]] <- combSumOne[[g]]
Here is a glimpse of the output:
head(myCombMat)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 0.5 0 0.5 0.0 0.0 0.0 0.0 0.0 0 0 0
[2,] 0.5 0 0.0 0.5 0.0 0.0 0.0 0.0 0 0 0
[3,] 0.5 0 0.0 0.0 0.5 0.0 0.0 0.0 0 0 0
[4,] 0.5 0 0.0 0.0 0.0 0.5 0.0 0.0 0 0 0
[5,] 0.5 0 0.0 0.0 0.0 0.0 0.5 0.0 0 0 0
[6,] 0.5 0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 0
tail(myCombMat)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[5466,] 0.10 0 0.10 0 0.20 0 0.20 0 0.20 0 0.20
[5467,] 0.10 0 0.15 0 0.15 0 0.15 0 0.15 0 0.30
[5468,] 0.10 0 0.15 0 0.15 0 0.15 0 0.20 0 0.25
[5469,] 0.10 0 0.15 0 0.15 0 0.20 0 0.20 0 0.20
[5470,] 0.15 0 0.15 0 0.15 0 0.15 0 0.15 0 0.25
[5471,] 0.15 0 0.15 0 0.15 0 0.15 0 0.20 0 0.20
set.seed(42)
mySamp <- sample(nrow(myCombMat), 10)
sampMat <- myCombMat[mySamp, ]
rownames(sampMat) <- mySamp
sampMat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
5005 0.00 0.05 0.00 0.05 0.00 0.15 0.00 0.35 0.00 0.4 0.00
5126 0.00 0.15 0.00 0.15 0.00 0.20 0.00 0.20 0.00 0.0 0.30
1565 0.10 0.00 0.15 0.00 0.00 0.00 0.25 0.00 0.00 0.5 0.00
4541 0.05 0.00 0.05 0.00 0.00 0.15 0.00 0.00 0.25 0.0 0.50
3509 0.00 0.00 0.15 0.00 0.25 0.00 0.25 0.00 0.00 0.0 0.35
2838 0.00 0.10 0.00 0.15 0.00 0.00 0.35 0.00 0.00 0.0 0.40
4026 0.05 0.00 0.10 0.00 0.15 0.00 0.20 0.00 0.50 0.0 0.00
736 0.00 0.00 0.10 0.00 0.40 0.00 0.00 0.00 0.00 0.0 0.50
3590 0.00 0.00 0.15 0.00 0.20 0.00 0.00 0.30 0.00 0.0 0.35
3852 0.00 0.00 0.00 0.05 0.00 0.20 0.00 0.30 0.00 0.0 0.45
all(rowSums(myCombMat) == 1)
[1] TRUE
As you can see, every row sums to 1 and has no adjacent values.
If you really want permutations, we can generate all permutations of seq(0.05, 0.5, 0.05)
that sum to 1 for every possible length (just like we did for the combination):
permSumOne <- lapply(lowComb:highComb, function(x) {
permuteGeneral(seq(5L,50L,5L), x, TRUE,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = 100L) / 100
})
groupLenPerm <- sapply(permSumOne, nrow)
groupLenPerm
[1] 1 63 633 3246 10872
And use these to create our matrix of all possible permutations that sum to 1 and meet our adjacency requirement:
myPermMat <- matrix(0L, nrow = sum(groupLenPerm * combLen), ncol = vecLen)
s <- g <- 1L
e <- permRow <- nrow(permSumOne[[1L]])
for (a in myCombs[-numCombs]) {
for (i in 1:nrow(a)) {
myPermMat[s:e, a[i, ]] <- permSumOne[[g]]
s <- e + 1L
e <- e + permRow
}
e <- e - permRow
g <- g + 1L
permRow <- nrow(permSumOne[[g]])
e <- e + permRow
}
## the last element in myCombs is simply a
## vector, thus nrow would return NULL
myPermMat[s:e, myCombs[[numCombs]]] <- permSumOne[[g]]
And, once again, here is glimpse of the output:
head(myPermMat)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 0.5 0 0.5 0.0 0.0 0.0 0.0 0.0 0 0 0
[2,] 0.5 0 0.0 0.5 0.0 0.0 0.0 0.0 0 0 0
[3,] 0.5 0 0.0 0.0 0.5 0.0 0.0 0.0 0 0 0
[4,] 0.5 0 0.0 0.0 0.0 0.5 0.0 0.0 0 0 0
[5,] 0.5 0 0.0 0.0 0.0 0.0 0.5 0.0 0 0 0
[6,] 0.5 0 0.0 0.0 0.0 0.0 0.0 0.5 0 0 0
tail(myPermMat)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[128680,] 0.15 0 0.20 0 0.20 0 0.15 0 0.15 0 0.15
[128681,] 0.20 0 0.15 0 0.15 0 0.15 0 0.15 0 0.20
[128682,] 0.20 0 0.15 0 0.15 0 0.15 0 0.20 0 0.15
[128683,] 0.20 0 0.15 0 0.15 0 0.20 0 0.15 0 0.15
[128684,] 0.20 0 0.15 0 0.20 0 0.15 0 0.15 0 0.15
[128685,] 0.20 0 0.20 0 0.15 0 0.15 0 0.15 0 0.15
all(rowSums(myPermMat) == 1)
[1] TRUE
And, as the OP states, if we want to randomly pick 10000 of these we can use sample
to achieve this:
set.seed(101)
mySamp10000 <- sample(nrow(myPermMat), 10000)
myMat10000 <- myPermMat[mySamp10000, ]
rownames(myMat10000) <- mySamp10000
head(myMat10000)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
47897 0.00 0.0 0.00 0.50 0.0 0.25 0.0 0.00 0.05 0.0 0.20
5640 0.25 0.0 0.15 0.00 0.1 0.00 0.5 0.00 0.00 0.0 0.00
91325 0.10 0.0 0.00 0.15 0.0 0.40 0.0 0.00 0.20 0.0 0.15
84633 0.15 0.0 0.00 0.35 0.0 0.30 0.0 0.10 0.00 0.1 0.00
32152 0.00 0.4 0.00 0.05 0.0 0.00 0.0 0.25 0.00 0.3 0.00
38612 0.00 0.4 0.00 0.00 0.0 0.35 0.0 0.10 0.00 0.0 0.15
As RcppAlgos
is highly efficient, all steps above return instantly. On my 2008 Windows machine i5 2.5 GHz, the entire generation (including the permutations) takes less than 0.04 seconds.