Search code examples
rmatrixsplittime-series

Splitting Matrix into multiple couples of columns in R


using R I have a 10 rows X 6 columns matrix. I need to split it into submatrices throuhg gruoping couples of columns with no overlapping.

i.e. matrix has columns A,B,C,D,E,F and I need to extract 3 different matrices (or data.frames or whatever object within financial packages like zoo or timeSeries) formed by columns AB, CD and EF.

PS: matrix contains financial data series and any couple of columns has a date columns and a NAV column


Solution

  • Using some dummy data (note you must have a dataframe as otherwise R would not allow you to hold Date and numeric values in a matrix [unless they were all converted to characters or raw numeric representations])

    set.seed(42)
    df <- data.frame(A = Sys.Date() + 0:9, B = rnorm(10),
                     C = Sys.Date() - 0:9, D = rnorm(10),
                     E = Sys.Date() - 20:29, F = rnorm(10))
    
    > head(df)
               A          B          C          D          E          F
    1 2013-04-05  1.3709584 2013-04-05  1.3048697 2013-03-16 -0.3066386
    2 2013-04-06 -0.5646982 2013-04-04  2.2866454 2013-03-15 -1.7813084
    3 2013-04-07  0.3631284 2013-04-03 -1.3888607 2013-03-14 -0.1719174
    4 2013-04-08  0.6328626 2013-04-02 -0.2787888 2013-03-13  1.2146747
    5 2013-04-09  0.4042683 2013-04-01 -0.1333213 2013-03-12  1.8951935
    6 2013-04-10 -0.1061245 2013-03-31  0.6359504 2013-03-11 -0.4304691
    

    one easy way to do this is to form an index for the columns you want - here I chose the first column of each pair, 1, 3, 5, etc.

    start <- seq(1, by = 2, length = ncol(df) / 2)
    

    Then, we lapply over the indices in start and select from our data frame the ith and ith + 1 columns where i takes each index from start in turn (df[i:(i+1)])

    sdf <- lapply(start, function(i, df) df[i:(i+1)], df = df)
    

    which gives:

    > sdf
    [[1]]
                A           B
    1  2013-04-05  1.37095845
    2  2013-04-06 -0.56469817
    3  2013-04-07  0.36312841
    4  2013-04-08  0.63286260
    5  2013-04-09  0.40426832
    6  2013-04-10 -0.10612452
    7  2013-04-11  1.51152200
    8  2013-04-12 -0.09465904
    9  2013-04-13  2.01842371
    10 2013-04-14 -0.06271410
    
    [[2]]
                C          D
    1  2013-04-05  1.3048697
    2  2013-04-04  2.2866454
    ....
    
    > str(sdf)
    List of 3
     $ :'data.frame':   10 obs. of  2 variables:
      ..$ A: Date[1:10], format: "2013-04-05" "2013-04-06" ...
      ..$ B: num [1:10] 1.371 -0.565 0.363 0.633 0.404 ...
     $ :'data.frame':   10 obs. of  2 variables:
      ..$ C: Date[1:10], format: "2013-04-05" "2013-04-04" ...
      ..$ D: num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
     $ :'data.frame':   10 obs. of  2 variables:
      ..$ E: Date[1:10], format: "2013-03-16" "2013-03-15" ...
      ..$ F: num [1:10] -0.307 -1.781 -0.172 1.215 1.895 ...
    

    An advantage of keeping the sub-data frames in a list is that you can apply a function or other operation to the sub-data frames using a loop or a tools like lapply or sapply for example.