Search code examples
rxtszooquantmod

XTS data taking up too much space in memory?


I am using GetSymbols from quantmod in the following way:

temp0 <- getSymbols("AAPL",src = 'yahoo',from=Sys.Date()-100000,to = Sys.Date(),auto.assign=FALSE);

And I get large xts object using up little space.

temp0   Large xts (59664 elements, 546.2 Kb)

However, sometimes the xts objects take way too much space. Here are some of my objects.

temp1   Large xts (34848 elemnts, 25.5 Mb)
t12     Large xts (36 elements, 25.2 Mb)

With t12 being the result of head(temp1) and here is t12:

structure(t12)
                    Bid.Price Bid.Size Ask.Price Ask.Size Trade.Price Volume
2019-05-29 17:00:01  116.4922       51  116.5000      143    116.4922    208
2019-05-29 17:00:02  116.4922       71  116.5000      142    116.5000      2
2019-05-29 17:00:04  116.4844      427  116.4922       92    116.4844     72
2019-05-29 17:00:08  116.4922       83  116.5000      156    116.4922     21
2019-05-29 17:01:01  116.4922       71  116.5000      128    116.4922     34
2019-05-29 17:01:08  116.5000       13  116.5078      228    116.4922    192

I did find that if I use attributes(t12) I find that #na.action as well as attr(,"index") contain many values, over 2 million.

temp1 were very large data sets from which I filtered most data, but it seems that the object kept old useless data in #na.action and attr(,"index") if not more.

I don't know why this happens, but how can I clean it up? How can get my 6 row t12 to be the proper minimal size?

If it helps here is the full attributes output with max.print=10:

> attributes(t12)
$class
[1] "xts" "zoo"

$.indexCLASS
[1] "POSIXct" "POSIXt" 

$tclass
[1] "POSIXct" "POSIXt" 

$na.action
 [1]  1  3  6  7  8 12 13 15 16 17
 [ reached getOption("max.print") -- omitted 2201657 entries ]
attr(,"class")
[1] "omit"
attr(,"index")
 [1] 1519772400 1519772402 1519772407 1519772409 1519772410 1519772420 1519772424 1519772428 1519772429 1519772430
 [ reached getOption("max.print") -- omitted 2201657 entries ]

$index
[1] 1559167201 1559167202 1559167204 1559167208 1559167261 1559167268
attr(,"tzone")
[1] "America/Chicago"
attr(,"tclass")
[1] "POSIXct" "POSIXt" 

$dim
[1] 6 6

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
[1] "Bid.Price"   "Bid.Size"    "Ask.Price"   "Ask.Size"    "Trade.Price" "Volume"

How can I remove the excess information from the xts?

UPDATE

I seem to have found a workaround to the issue based on my code above.

t12 <- rbind(t12[1,],t12)
t12[1,1] < NA
t12 <- na.omit(t12)

I add the top row of the data to the top and set the first entry to NA. Then when I do na.omit on the data set the resting data is the same as t12 originally, but clean without the extra bad data.

The issue is that I already use na.omit() to create the temp1 set and don't know why sometimes na.omit() does not clean the data correctly? possibly with very large data sets?


Solution

  • The na.omit() function adds the na.action attribute. This attribute contains the locations and index values for all the observations that were removed. It is added to be consistent with other methods for na.omit().

    Your work-around makes the object smaller because it overwrites the na.action attribute with values that represent the one missing value you added to the beginning of the series.

    Setting the na.action attribute to NULL is a clearer work-around.

    R> x <- .xts(1:1000, 1:1000); set.seed(21); is.na(x) <- sample(1000, 100)
    R> y <- na.omit(x)
    R> object.size(x)
    9176 bytes
    R> object.size(y)
    9720 bytes
    R> attr(y, "na.action") <- NULL
    R> object.size(y)
    8376 bytes