Search code examples
rxtszoo

Control the minimal number of required values when aggregating xts objects


When working with e.g. daily air temperature data, I'd like to have better control of how many observations are allowed to be missing in order to not calculate the monthly average (or whatever aggregation function you want to use).

library(xts)

# init xts object
datetimes <- seq("2000-01-01" |> as.POSIXct(tz = "UTC"), 
                 "2000-05-31" |> as.POSIXct(tz = "UTC"),
                 by = "day")

values <- rep(1, length(datetimes))

x <- xts(values, order.by = datetimes)

# create some artificial gaps
zoo::coredata(x["2000-01-03"]) <- NA
zoo::coredata(x["2000-02-01/2000-02-05"]) <- rep(NA, 5)
zoo::coredata(x["2000-04-01/2000-04-29"]) <- rep(NA, 29)

With this data given, I'm able to control whether or not missing data is allowed at all (in a binary way) when applying functions using na.rm :

# monthly aggregates respecting NA values
aggregate(x, format(time(x), "%m"), sum)
#>      
#> 01 NA
#> 02 NA
#> 03 31
#> 04 NA
#> 05 31
# monthly aggregates neglecting NA values
aggregate(x, format(time(x), "%m"), sum, na.rm = TRUE)
#>      
#> 01 30
#> 02 24
#> 03 31
#> 04  1
#> 05 31

What I would need, would be an approach to be able to determine the fraction of allowed missing values, maybe something like the partial argument from zoo::rollapply().

Expected output (with min. required values set to e.g. 80 %):

aggregate(x, format(time(x), "%m"), sum, SOME_MAGIC = 0.8)
#> 01 30
#> 02 24
#> 03 31
#> 04 NA
#> 05 31

Solution

  • You could put a condition inside the aggregate function:

    aggregate(x, format(time(x), "%m"), function(x) 
      ifelse(mean(is.na(x))<0.8, sum(x, na.rm=TRUE), sum(x)))
    
    01 30
    02 24
    03 31
    04 NA
    05 31