I am using GetSymbols from quantmod in the following way:
temp0 <- getSymbols("AAPL",src = 'yahoo',from=Sys.Date()-100000,to = Sys.Date(),auto.assign=FALSE);
And I get large xts object using up little space.
temp0 Large xts (59664 elements, 546.2 Kb)
However, sometimes the xts objects take way too much space. Here are some of my objects.
temp1 Large xts (34848 elemnts, 25.5 Mb)
t12 Large xts (36 elements, 25.2 Mb)
With t12 being the result of head(temp1) and here is t12:
Bid.Price Bid.Size Ask.Price Ask.Size Trade.Price Volume
2019-05-29 17:00:01 116.4922 51 116.5000 143 116.4922 208
2019-05-29 17:00:02 116.4922 71 116.5000 142 116.5000 2
2019-05-29 17:00:04 116.4844 427 116.4922 92 116.4844 72
2019-05-29 17:00:08 116.4922 83 116.5000 156 116.4922 21
2019-05-29 17:01:01 116.4922 71 116.5000 128 116.4922 34
2019-05-29 17:01:08 116.5000 13 116.5078 228 116.4922 192
I did find that if I use attributes(t12) I find that #na.action as well as attr(,"index") contain many values, over 2 million.
temp1 were very large data sets from which I filtered most data, but it seems that the object kept old useless data in #na.action and attr(,"index") if not more.
I don't know why this happens, but how can I clean it up? How can get my 6 row t12 to be the proper minimal size?
If it helps here is the full attributes output with max.print=10:
> attributes(t12)
[1] "xts" "zoo"
[1] "POSIXct" "POSIXt"
[1] "POSIXct" "POSIXt"
[1] 1 3 6 7 8 12 13 15 16 17
[ reached getOption("max.print") -- omitted 2201657 entries ]
[1] "omit"
[1] 1519772400 1519772402 1519772407 1519772409 1519772410 1519772420 1519772424 1519772428 1519772429 1519772430
[ reached getOption("max.print") -- omitted 2201657 entries ]
[1] 1559167201 1559167202 1559167204 1559167208 1559167261 1559167268
[1] "America/Chicago"
[1] "POSIXct" "POSIXt"
[1] 6 6
[1] "Bid.Price" "Bid.Size" "Ask.Price" "Ask.Size" "Trade.Price" "Volume"
How can I remove the excess information from the xts?
I seem to have found a workaround to the issue based on my code above.
t12 <- rbind(t12[1,],t12)
t12[1,1] < NA
t12 <- na.omit(t12)
I add the top row of the data to the top and set the first entry to NA. Then when I do na.omit on the data set the resting data is the same as t12 originally, but clean without the extra bad data.
The issue is that I already use na.omit() to create the temp1 set and don't know why sometimes na.omit() does not clean the data correctly? possibly with very large data sets?
The na.omit()
function adds the na.action
attribute. This attribute contains the locations and index values for all the observations that were removed. It is added to be consistent with other methods for na.omit()
Your work-around makes the object smaller because it overwrites the na.action
attribute with values that represent the one missing value you added to the beginning of the series.
Setting the na.action
attribute to NULL
is a clearer work-around.
R> x <- .xts(1:1000, 1:1000); set.seed(21); is.na(x) <- sample(1000, 100)
R> y <- na.omit(x)
R> object.size(x)
9176 bytes
R> object.size(y)
9720 bytes
R> attr(y, "na.action") <- NULL
R> object.size(y)
8376 bytes