Search code examples
rutczoo

double timestamp in dataframe merge


I do have a df looking like this:

time,v1,v1,v3,v4
1352639505, , ,94,101
1352639565, , ,94,101
1352639505,10,222, ,
1352639565,11,221, ,

First one is UTC Timestamp from 1970-01-01 - so I would use as.POSIXct(df$time,origin="1970-01-01",tz="UTC"). But as you can see date is double. NAs and values are skipped. How to merge the same date in the dataframe? First setting as.POSIXct or merging?

Result should look like this:

time,v1,v1,v3,v4
1352639505,10,222,94,101
1352639565,11,221,94,101

Of course it could also be zoo afterwads with real date! Thanks!


Solution

  • I would treat each column of your data.frame individually. For each column, drop the missing observations from the column and the time index, then construct your zoo object. Then you can use do.call to call merge on your list of zoo objects.

    # helper function to build zoo objects with no missing values
    f <- function(v, i) {
      na <- is.na(v)
      iposix <- as.POSIXct(i, origin="1970-01-01", tz="UTC")
      zoo(v[!na], iposix[!na])
    }
    df <- structure(list(time = c(1352639505L, 1352639565L, 1352639505L, 
    1352639565L), v1 = c(NA, NA, 10L, 11L), v1 = c(NA, NA, 222L, 
    221L), v3 = c(94L, 94L, NA, NA), v4 = c(101L, 101L, NA, NA)), .Names = c("time", 
    "v1", "v1", "v3", "v4"), class = "data.frame", row.names = c(NA, -4L))
    library(zoo)
    Data <- do.call(merge, lapply(df[-1], f, i=df$time))
    Data
    #                     v1 v1.1 v3  v4
    # 2012-11-11 07:11:45 10  222 94 101
    # 2012-11-11 07:12:45 11  221 94 101