Search code examples
rlistreorganize

How to reorganize data from a list by rownames into new dataframes? R


I have multiple xts objects stored in a list each with 1000+ rows. They represent stock rolling window regression data. Each element has it's unique Ticker name. Here they are called Stock1, 2 ...etc for testing purposes. Rows are named by date as is the xts format. Each element is equal in dimensions. Each one looks like this:

  > tail(testlist$Stock1, n = 3)
            (Intercept)     rmrf         smb        hml        rmw      cma
2014-12-29 0.0003223177 1.010215 -0.02164844 -0.3322500 0.07819563 1.106934
2014-12-30 0.0002631315 1.002356 -0.02351438 -0.3465390 0.05954400 1.118506
2014-12-31 0.0002837304 1.000084 -0.01619536 -0.3494401 0.06121434 1.124845
> tail(testlist$Stock2, n = 3)
            (Intercept)      rmrf        smb         hml        rmw       cma
2014-12-29 0.0003308951 0.7503819 -0.1967255 -0.10242616 -0.2264914 0.8329570
2014-12-30 0.0003051495 0.7409709 -0.1899856 -0.07461764 -0.2240448 0.7921883
2014-12-31 0.0002614874 0.7478099 -0.1833077 -0.06197362 -0.2056615 0.7550211
> tail(testlist$Stock3, n = 3)
             (Intercept)      rmrf        smb       hml        rmw        cma
2014-12-29 -0.0003803988 0.8363603 -0.4153470 0.7459769 -0.7981382 -0.2839360
2014-12-30 -0.0004121386 0.8352243 -0.4224404 0.7405976 -0.8114066 -0.2790438
2014-12-31 -0.0004660716 0.8355641 -0.4343012 0.7571033 -0.8057412 -0.3026019
> tail(testlist$Stock4, n = 3)
             (Intercept)      rmrf         smb         hml        rmw       cma
2014-12-29 -0.0008295692 0.9296299 -0.07776571 0.007084297 -0.1377356 0.8038542
2014-12-30 -0.0007734696 0.9383387 -0.08941983 0.011685507 -0.1092656 0.7863335
2014-12-31 -0.0007591168 0.9391670 -0.08782070 0.015619229 -0.1083707 0.7924232

What i need to do: Merge the rows by name and by aggregating all data in my list to obtain a new set of data. Each should look like this:

 Name    Date      (Intercept)     rmrf        smb        hml        rmw         cma

Stock1 2014-12-29  0.0003223177 1.010215  -0.02164844 -0.3322500  0.07819563  1.106934
Stock2 2014-12-29  0.0003308951 0.7503819 -0.1967255  -0.10242616 -0.2264914  0.8329570
Stock3 2014-12-29 -0.0003803988 0.8363603 -0.4153470   0.7459769  -0.7981382 -0.2839360
Stock4 2014-12-29 -0.0008295692 0.9296299 -0.07776571  0.007084297 -0.1377356 0.8038542

Each such element should not be a time-series any more. but a static one, with each stock representing it's coeffiecient values at time "t". In terms of size each element should have a number of rows equal to the number of Stocks in the original list.

EDIT as asked by josilber

   > dput(list(Stock1=tail(testlist$Stock1, n = 3), Stock2=tail(testlist$Stock2, n = 3)))

structure(list(Stock1 = structure(c(0.000322317700198485, 0.000263131488679374, 
0.000283730373928844, 1.01021497011709, 1.00235580055438, 1.00008407331697, 
-0.0216484434660844, -0.023514378867335, -0.0161953614672028, 
-0.332250031553704, -0.346538978804535, -0.349440052163927, 0.078195628743663, 
0.0595439997647003, 0.0612143446991752, 1.1069343396633, 1.11850626745067, 
1.12484530131584), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1419811200, 
1419897600, 1419984000), tzone = "UTC", tclass = "Date"), .Dim = c(3L, 
6L), .Dimnames = list(NULL, c("(Intercept)", "rmrf", "smb", "hml", 
"rmw", "cma"))), Stock2 = structure(c(0.000330895099805035, 0.000305149500450527, 
0.000261487411574969, 0.750381906747217, 0.740970893865186, 0.747809929767095, 
-0.1967254672836, -0.189985607343021, -0.183307667378927, -0.10242615734439, 
-0.0746176364711423, -0.0619736225998069, -0.226491384004977, 
-0.224044849587752, -0.205661480898329, 0.832956994676299, 0.792188348360969, 
0.755021100668421), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1419811200, 
1419897600, 1419984000), tzone = "UTC", tclass = "Date"), .Dim = c(3L, 
6L), .Dimnames = list(NULL, c("(Intercept)", "rmrf", "smb", "hml", 
"rmw", "cma")))), .Names = c("Stock1", "Stock2"))

I am completely in the dark. I have looked at some functions that may come to use: lapply / also the merge function seems to be suitable but it only works on 2 elements.

I will continue to update this post as i search for answers. If anyone has any leads or has done this before and can point in the right direction, thank you!

EDIT

#Flatten data add one more name and put into one data frame 

all_coef_data<- do.call(rbind,Map(cbind,
                  Name=names(testlist),
                  Date=lapply(testlist,function(x) as.Date(as.POSIXct(c(attr(x,'index')),origin='1970-01-01'))),
                  lapply(testlist, as.data.frame)
))

A common denominator in each row that i need to get out is the Date. I now split the dataframe by Date using this. The output is a list.

out <- split(all_coef_data , f = all_coef_data$Date )

output:

> head(out$'2011-05-23', n=3)
                    Name       Date   (Intercept)     rmrf         smb        hml         rmw        cma
Stock1.2011-05-23 Stock1 2011-05-23 -4.376389e-04 1.103582 -0.21747611 -0.1879211 -0.05849794 -0.1949192
Stock2.2011-05-23 Stock2 2011-05-23  1.115140e-04 1.198622  0.05422819  0.9998529  0.92141407 -0.8565260
Stock3.2011-05-23 Stock3 2011-05-23  5.457214e-05 1.303025  0.04705294  0.6897673 -0.19708983 -0.8247877
> tail(out$'2011-05-23', n=3)
                      Name       Date  (Intercept)     rmrf        smb         hml         rmw        cma
Stock48.2011-05-23 Stock48 2011-05-23 0.0007354997 0.505054  0.1774544 -0.38934089  0.71775909  0.5189329
Stock49.2011-05-23 Stock49 2011-05-23 0.0004224351 1.304719  0.4511903 -0.64937062 -0.08872941  0.1545058
Stock50.2011-05-23 Stock50 2011-05-23 0.0003851261 1.020434 -0.1107910 -0.03964192  0.09526658 -0.4961902

Solution

  • Sounds like you want to

    1. flatten the xts objects to data.frames using as.data.frame(),
    2. cbind() new columns Name (from list component names) and Date (from xts row names, which actually come from the index attribute), and finally
    3. rbind() everything together into a single data.frame.

    do.call(rbind,Map(cbind,
        Name=names(testlist),
        Date=lapply(testlist,function(x) as.Date(as.POSIXct(c(attr(x,'index')),origin='1970-01-01'))),
        lapply(testlist,as.data.frame)
    ));
    ##                     Name       Date  (Intercept)      rmrf         smb         hml         rmw       cma
    ## Stock1.2014-12-29 Stock1 2014-12-29 0.0003223177 1.0102150 -0.02164844 -0.33225003  0.07819563 1.1069343
    ## Stock1.2014-12-30 Stock1 2014-12-30 0.0002631315 1.0023558 -0.02351438 -0.34653898  0.05954400 1.1185063
    ## Stock1.2014-12-31 Stock1 2014-12-31 0.0002837304 1.0000841 -0.01619536 -0.34944005  0.06121434 1.1248453
    ## Stock2.2014-12-29 Stock2 2014-12-29 0.0003308951 0.7503819 -0.19672547 -0.10242616 -0.22649138 0.8329570
    ## Stock2.2014-12-30 Stock2 2014-12-30 0.0003051495 0.7409709 -0.18998561 -0.07461764 -0.22404485 0.7921883
    ## Stock2.2014-12-31 Stock2 2014-12-31 0.0002614874 0.7478099 -0.18330767 -0.06197362 -0.20566148 0.7550211
    

    If you don't like the new row names, you can wrap the line in `rownames<-`(...,NULL).