Search code examples
rtime-seriesinterpolationxtszoo

Using xblocks for interpolated values across time series in r


I'm having trouble figuring out how to use xblocks() to work. First, here's a small example from a much larger dataset:

data <- data.frame(
    Date = sample(c("1993-07-05", "1993-07-05", "1993-07-05", "1993-08-30", "1993-08-30", "1993-08-30", "1993-08-30", "1993-09-04", "1993-09-04")),   
    Oxygen = sample(c("0.9", "0.4", "4.2", "5.6", "7.3", NA, "9.5", NA, "0.3")))

I then averaged values for each month using xts:

xtsAveragedata <- xts(Averagedata[-1], Averagedata[[1]])
xtsAverageMonthlyData <- apply.monthly(xtsAveragedata, FUN = mean)

Now, I linear interpolated my data:

Interpolateddata <- na.approx(xtsAverageMonthlyData)

I want to create a figure in which I use xblocks() or something similar to show the regions in my data where I used interpolation, something like this, which I found online: enter image description here

How do I go about doing this for all values/automate for my entire dataset? There's no examples I could translate into something like this from the reference guide.

Thank you for your help. It is greatly appreciated.


Solution

  • So this doesn't use xts or zoo, but maybe this walkthrough will be helpful. I am using a slightly larger (and daily) dataset, but it should be reproducible:

    library(tidyverse)
    library(lubridate)
    
    set.seed(4)
    df <- tibble(
      Date = seq.Date(ymd("1993-07-01"), by = "1 day", length.out = 100),
      Oxygen = runif(100, 0, 10)
    )
    
    # Randomly assign 20 records to NA
    df[sample(1:nrow(df), 20), "Oxygen"] <- NA
    
    df_for_plot <- df %>%
      arrange(Date) %>%
      group_by(month(Date)) %>%
      mutate(
        is_na = is.na(Oxygen),
        month_avg = mean(Oxygen, na.rm = TRUE),
        oxygen_to_plot = if_else(is_na, month_avg, Oxygen)
      )
    
    df_for_plot
    #> # A tibble: 100 x 6
    #> # Groups:   month(Date) [4]
    #>    Date        Oxygen `month(Date)` is_na month_avg oxygen_to_plot
    #>    <date>       <dbl>         <dbl> <lgl>     <dbl>          <dbl>
    #>  1 1993-07-01  5.86               7 FALSE      5.87         5.86  
    #>  2 1993-07-02  0.0895             7 FALSE      5.87         0.0895
    #>  3 1993-07-03  2.94               7 FALSE      5.87         2.94  
    #>  4 1993-07-04  2.77               7 FALSE      5.87         2.77  
    #>  5 1993-07-05  8.14               7 FALSE      5.87         8.14  
    #>  6 1993-07-06 NA                  7 TRUE       5.87         5.87  
    #>  7 1993-07-07  7.24               7 FALSE      5.87         7.24  
    #>  8 1993-07-08  9.06               7 FALSE      5.87         9.06  
    #>  9 1993-07-09  9.49               7 FALSE      5.87         9.49  
    #> 10 1993-07-10  0.731              7 FALSE      5.87         0.731 
    #> # ... with 90 more rows
    
    # Plot the regular data, but for the geom_rect use only the filtered data where the is_na column is TRUE.
    # Assuming you have daily data, you just set the xmax to be that Date + 1.
    ggplot(df_for_plot, aes(x = Date, y = oxygen_to_plot)) +
      geom_line() +
      geom_rect(
        data = df_for_plot %>% filter(is_na), 
        aes(xmin = Date, xmax = Date + 1, ymin = -Inf, ymax = +Inf), fill = "skyblue", alpha = 0.5
      )