Search code examples
rxts

R Compare two XTS is losing last date


I have two XTS for same dates. When I compare the two, the result eliminates the last row: For example, both a and b have 20 dates and a single column. a>b however gives 19 rows only

a <- structure(c(2433.3, 2423, 2408.85, 2390, 2592.25, 2492.95, 2447, 
2397, 2298.95, 2286, 2306.3, 2301.8, 2342, 2341, 2328, 2338, 
2302.95, 2322.05, 2360, 2353.7), class = c("xts", "zoo"), index = structure(c(1613001600, 
1613088000, 1613347200, 1613433600, 1613520000, 1613606400, 1613692800, 
1613952000, 1614038400, 1614124800, 1614211200, 1614297600, 1614556800, 
1614643200, 1614729600, 1614816000, 1614902400, 1615161600, 1615248000, 
1615357396), tzone = "UTC", tclass = "Date"), .Dim = c(20L, 1L
), .Dimnames = list(NULL, "Close"))


b <- structure(c(2410, 2390.1, 2391.4, 2445, 2426.05, 2415.9, 2390, 
2617.6, 2492, 2474.8, 2430, 2330, 2320, 2280, 2342.25, 2346, 
2342.05, 2318, 2338, 2310), class = c("xts", "zoo"), index = structure(c(1613001600, 
1613088000, 1613347200, 1613433600, 1613520000, 1613606400, 1613692800, 
1613952000, 1614038400, 1614124800, 1614211200, 1614297600, 1614556800, 
1614643200, 1614729600, 1614816000, 1614902400, 1615161600, 1615248000, 
1615334400), tzone = "UTC", tclass = "Date"), .Dim = c(20L, 1L
), .Dimnames = list(NULL, "Lag.2"))

a>b

dput (a>b)
structure(c(TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, 
FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, 
TRUE, TRUE), .Dim = c(19L, 1L), .Dimnames = list(NULL, "Close"), index = structure(c(1613001600, 
1613088000, 1613347200, 1613433600, 1613520000, 1613606400, 1613692800, 
1613952000, 1614038400, 1614124800, 1614211200, 1614297600, 1614556800, 
1614643200, 1614729600, 1614816000, 1614902400, 1615161600, 1615248000
), tzone = "UTC", tclass = "Date"), class = c("xts", "zoo"))

Solution

  • Difficult to spot but there is a difference between the index of a and the index of b The last timestamps of a and b are:

    a = 1615357396

    b = 1615334400

    Which if you format them to see what they are:

    format(as.POSIXct(1615357396, tz = "UTC", origin = "1970-01-01"), "%Y-%m-%d %H:%M:%S")
    [1] "2021-03-10 00:00:00"
    format(as.POSIXct(1615334400, tz = "UTC", origin = "1970-01-01"), "%Y-%m-%d %H:%M:%S")
    [1] "2021-03-10 06:23:16"
    

    As you can see these are 2 different timestamps. If you merge a and be based on what you have you get:

     merge(a, b)
                 Close   Lag.2
    2021-02-11 2433.30 2410.00
    2021-02-12 2423.00 2390.10
    2021-02-15 2408.85 2391.40
    2021-02-16 2390.00 2445.00
    2021-02-17 2592.25 2426.05
    2021-02-18 2492.95 2415.90
    2021-02-19 2447.00 2390.00
    2021-02-22 2397.00 2617.60
    2021-02-23 2298.95 2492.00
    2021-02-24 2286.00 2474.80
    2021-02-25 2306.30 2430.00
    2021-02-26 2301.80 2330.00
    2021-03-01 2342.00 2320.00
    2021-03-02 2341.00 2280.00
    2021-03-03 2328.00 2342.25
    2021-03-04 2338.00 2346.00
    2021-03-05 2302.95 2342.05
    2021-03-08 2322.05 2318.00
    2021-03-09 2360.00 2338.00
    2021-03-10      NA 2310.00
    2021-03-10 2353.70      NA
    

    You can see that the last 2 records have the same day but have a different index date. If you compare a > b you are comparing with NA and these records will be removed.

    One way around this is to do as.xts before comparing:

    as.xts(a) > as.xts(b)
               Close
    2021-02-11  TRUE
    2021-02-12  TRUE
    2021-02-15  TRUE
    2021-02-16 FALSE
    2021-02-17  TRUE
    2021-02-18  TRUE
    2021-02-19  TRUE
    2021-02-22 FALSE
    2021-02-23 FALSE
    2021-02-24 FALSE
    2021-02-25 FALSE
    2021-02-26 FALSE
    2021-03-01  TRUE
    2021-03-02  TRUE
    2021-03-03 FALSE
    2021-03-04 FALSE
    2021-03-05 FALSE
    2021-03-08  TRUE
    2021-03-09  TRUE
    2021-03-10  TRUE
    

    Better would be to make sure you have the correct timestamps in the first place. But this depends on where the input data is coming from.