Search code examples
rxts

Merging xts in R - Converting Characters to NA


I have 3 xts objects

logged <- xts::xts(x = loggedInUsers$loggedInUsers, order.by = Sys.time())
loadValue <- xts::xts(x = loadAvg, order.by = Sys.time())
hostname <- xts::xts(x = loadHost, order.by = Sys.time())

dput(hostname)
dput(loadValue)
dput(logged)

dput gives the following result

 structure("deliverforgoodportal", .Dim = c(1L, 1L), index = structure(1551088127.27724, tzone = "", tclass = c("POSIXct",
    "POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct",
    "POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "")

structure(0, .Dim = c(1L, 1L), .Dimnames = list(NULL, "load"), index = structure(1551088127.27676, tzone = "", tclass = c("POSIXct",
"POSIXt")), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct",
"POSIXt"), .indexTZ = "", tzone = "", class = c("xts", "zoo"))

structure(1, .Dim = c(1L, 1L), index = structure(1551088127.27637, tzone = "", tclass = c("POSIXct",
"POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "")

When I am merging this three and printing the hostname is converted to NA

  tmp <- merge.xts(hostname, logged, loadValue, all = TRUE)
    print(tmp)

Output is: (hostname is NA)

                    hostname logged  load
2019-02-25 09:48:47       NA      1    NA
2019-02-25 09:48:47       NA     NA    0
2019-02-25 09:48:47       NA     NA    NA

Why is this coming as NA?


Solution

  • You should realise that an xts object is a timeseries and a matrix. Now a matrix can only contain one type of values, either character or numeric. But not both. Your merge is trying to combine a character value matrix (hostname) with numeric values (logged and load). This results in the hostname values being coerced to NA.

    If you want to join this data, you have to use a data.frame (or data.table). Also note that your time values are not equal, they are of by milliseconds. So if you want to join on minutes, first use floor_date from the lubridate package. See below two examples with and without lubridate. I use the package timetk to convert the xts objects to a tibble, but depending on your source data that might not be necessary.

    with full_join, no lubridate

    library(timetk)
    library(dplyr)
    hostname <- tk_tbl(hostname)
    loadValue <- tk_tbl(loadValue)
    logged <- tk_tbl(logged)
    
    hostname %>% 
      full_join(loadValue) %>% 
      full_join(logged, 
                by = "index", 
                suffix = c("_hostname", "_logged"))
    
    Joining, by = "index"
    # A tibble: 3 x 4
      index               value_hostname        load value_logged
      <dttm>              <chr>                <dbl>        <dbl>
    1 2019-02-25 10:48:47 deliverforgoodportal    NA           NA
    2 2019-02-25 10:48:47 NA                       0           NA
    3 2019-02-25 10:48:47 NA                      NA            1
    

    with lubridate and left join:

    hostname %>% 
      mutate(index = lubridate::floor_date(index, unit = "seconds")) %>% 
      left_join(loadValue %>% mutate(index = lubridate::floor_date(index, unit = "seconds"))) %>% 
      left_join(logged %>% mutate(index = lubridate::floor_date(index, unit = "seconds")), 
                by = "index", 
                suffix = c("_hostname", "_logged"))    
    
    Joining, by = "index"
    # A tibble: 1 x 4
      index               value_hostname        load value_logged
      <dttm>              <chr>                <dbl>        <dbl>
    1 2019-02-25 10:48:47 deliverforgoodportal     0            1