I want to split a vector into subvectors with the following: g conditions:
Each sub-vector has an equal length l
which is less than the number
of the parent vector v
.
Each sub-vector is unique in its elements' composition and contains consecutive elements.
Elements of a particular sub-vector overlap with elements of previous and subsequent sub-vector.
No subvector must start with the position of an element that is divisible by l
. Take for instance, if l=2
no subvector must start 2, 4, 6, 8, 10, 12, ..., n
, for l=3
no subvector must start 3, 6, 9, 12, 15, 18, ..., n
, for l=3
no subvector must start 4, 8, 12, 16, 20, 24, ..., n
etc.
The input should be a vector for the parent vector v
, and an
integer for the block length l
. While the output should be a list
of vectors (not a matrix) such that each sub-vector is output as a
vector and the list of all sub-vectors is a list.
The below code shows a case where the conditiontion 4
above is not applied.
v <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) # the parent vector
l <- 3 # constant length of sub-vectors to be
m <- length(v) - l + 1 # number of sub-vector to be
split(t(embed(v, m))[m:1,], 1:m)
$`1`
[1] 1 2 3
$`2`
[1] 2 3 4
$`3`
[1] 3 4 5
$`4`
[1] 4 5 6
$`5`
[1] 5 6 7
$`6`
[1] 6 7 8
$`7`
[1] 7 8 9
$`8`
[1] 8 9 10
The result I have in the above code will now be worked open by manually removing the subvectors that violate condition number 4
above.
I know that my number of subvectors should be length(ts) - l + 1 - floor((length(ts) - l + 1)/l)
but when I tried the code below:
What I Want
$`1`
[1] 1 2 3
$`2`
[1] 2 3 4
$`3`
[1] 4 5 6
$`4`
[1] 5 6 7
$`5`
[1] 7 8 9
$`6`
[1] 8 9 10
The result must satisfy my number 4 condition
and every other.
For illustration, consider a parent vector of x1
to x10
with a subvector size of l = 3
consecutive elements of its parent vector as follows:
x1, x2, x3
x2, x3, x4
x4, x5, x6
x5, x6, x7
x7, x8, x9
x8, x9, x10
What I do is form a series of subvectors each with length l =3
with starting elements being progressive in nature (x1, x2 x4, x5, x7, x8, x10
) and not recursive. The third sub-vector starts from x4
and not x3
because starting it from x3
will make x3
3 a position of the original vector that is divisible by l = 3
. The same consideration is applied to the 6th and the supposed 7th sub-vector.
How I Need It
I need an R
code that gives me the output I want according to the conditions above. You can use v <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
for parent vector input with your choice of 1 < l < length(v)
in your R code
test.
One possibillity would be to create an empty list and append each subvector only if its first element is not divisible by l
. Then we remove all NULL
elements from the created list.
v <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) # the parent vector
l <- 3 # constant length of sub-vectors to be
m <- length(v) - l + 1 # number of sub-vector to be
li <- vector("list",m)
for (i in 1:m) {
if (v[i]%%l) {
li[[i]] <- v[i:(i+l-1)]
}
}
> Filter(Negate(is.null),li)
[[1]]
[1] 1 2 3
[[2]]
[1] 2 3 4
[[3]]
[1] 4 5 6
[[4]]
[1] 5 6 7
[[5]]
[1] 7 8 9
[[6]]
[1] 8 9 10
Or as a function :
kmers <- function(v,k) {
m <- (length(v)-k+1)
li <- vector("list",m)
for (i in 1:m) {
if (v[i]%%k) {
li[[i]] <- v[i:(i+k-1)]
}
}
Filter(Negate(is.null),li)
}
> kmers(v,3)
[[1]]
[1] 1 2 3
[[2]]
[1] 2 3 4
[[3]]
[1] 4 5 6
[[4]]
[1] 5 6 7
[[5]]
[1] 7 8 9
[[6]]
[1] 8 9 10
This is not an very " R
" typicall solution, maybe there is something more elegant, but its not a very R typical problem either.