Search code examples
rvectorsplit

Split a vector Into Some Overlapping Number of Subvectors With the following Conditions


I want to split a vector into subvectors with the following: g conditions:

  1. Each sub-vector has an equal length l which is less than the number of the parent vector v.

  2. Each sub-vector is unique in its elements' composition and contains consecutive elements.

  3. Elements of a particular sub-vector overlap with elements of previous and subsequent sub-vector.

  4. No subvector must start with the position of an element that is divisible by l. Take for instance, if l=2 no subvector must start 2, 4, 6, 8, 10, 12, ..., n, for l=3 no subvector must start 3, 6, 9, 12, 15, 18, ..., n, for l=3 no subvector must start 4, 8, 12, 16, 20, 24, ..., n etc.

  5. The input should be a vector for the parent vector v, and an integer for the block length l. While the output should be a list of vectors (not a matrix) such that each sub-vector is output as a vector and the list of all sub-vectors is a list.

The below code shows a case where the conditiontion 4 above is not applied.

v <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) # the parent vector
l <- 3 # constant length of sub-vectors to be
m <- length(v) - l + 1 # number of sub-vector to be
split(t(embed(v, m))[m:1,], 1:m)

$`1`
[1] 1 2 3

$`2`
[1] 2 3 4

$`3`
[1] 3 4 5

$`4`
[1] 4 5 6

$`5`
[1] 5 6 7

$`6`
[1] 6 7 8

$`7`
[1] 7 8 9

$`8`
[1]  8  9 10

The result I have in the above code will now be worked open by manually removing the subvectors that violate condition number 4 above.

I know that my number of subvectors should be length(ts) - l + 1 - floor((length(ts) - l + 1)/l) but when I tried the code below:

What I Want

$`1`
[1] 1 2 3

$`2`
[1] 2 3 4

$`3`
[1] 4 5 6

$`4`
[1] 5 6 7

$`5`
[1] 7 8 9

$`6`
[1]  8  9 10

The result must satisfy my number 4 condition and every other.

For illustration, consider a parent vector of x1 to x10 with a subvector size of l = 3 consecutive elements of its parent vector as follows:

x1, x2, x3
    x2, x3, x4
            x4, x5, x6
                x5, x6, x7
                        x7, x8, x9
                            x8, x9, x10

What I do is form a series of subvectors each with length l =3 with starting elements being progressive in nature (x1, x2 x4, x5, x7, x8, x10) and not recursive. The third sub-vector starts from x4 and not x3 because starting it from x3 will make x3 3 a position of the original vector that is divisible by l = 3. The same consideration is applied to the 6th and the supposed 7th sub-vector.

How I Need It

I need an R code that gives me the output I want according to the conditions above. You can use v <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) for parent vector input with your choice of 1 < l < length(v) in your R code test.


Solution

  • One possibillity would be to create an empty list and append each subvector only if its first element is not divisible by l . Then we remove all NULL elements from the created list.

    v <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) # the parent vector
    l <- 3 # constant length of sub-vectors to be
    m <- length(v) - l + 1 # number of sub-vector to be
    
    li <- vector("list",m)
    
    for (i in 1:m) {
      if (v[i]%%l) {
        li[[i]] <- v[i:(i+l-1)]
      }
    }
    
    > Filter(Negate(is.null),li)
    [[1]]
    [1] 1 2 3
    
    [[2]]
    [1] 2 3 4
    
    [[3]]
    [1] 4 5 6
    
    [[4]]
    [1] 5 6 7
    
    [[5]]
    [1] 7 8 9
    
    [[6]]
    [1]  8  9 10
    
    

    Or as a function :

    kmers <- function(v,k) {
      m <- (length(v)-k+1)
      li <- vector("list",m)
      for (i in 1:m) {
        if (v[i]%%k) {
          li[[i]] <- v[i:(i+k-1)]
        }
      }
      Filter(Negate(is.null),li)
    }
    
    > kmers(v,3)
    [[1]]
    [1] 1 2 3
    
    [[2]]
    [1] 2 3 4
    
    [[3]]
    [1] 4 5 6
    
    [[4]]
    [1] 5 6 7
    
    [[5]]
    [1] 7 8 9
    
    [[6]]
    [1]  8  9 10
    
    

    This is not an very " R" typicall solution, maybe there is something more elegant, but its not a very R typical problem either.