Search code examples
rrollapply

Problems with Rollapplyr function due to NA values (that should not be removed) in R


I have a dataframe:

      date comp  ei
1   1/1/73    A  NA
2   1/4/73    A 0.6
3   1/7/73    A 0.7
4  1/10/73    A 0.9
5   1/1/74    A 0.4
6   1/4/74    A 0.5
7   1/7/74    A 0.7
8  1/10/74    A 0.7
9   1/1/75    A 0.4
10  1/4/75    A 0.5
11  1/1/73    B 0.8
12  1/4/73    B 0.8
13  1/7/73    B 0.5
14 1/10/73    B 0.6
15  1/1/74    B 0.3
16  1/4/74    B 0.2
17  1/1/73    C  NA
18  1/4/73    C 0.6
19  1/7/73    C 0.4
20 1/10/73    C 0.8
21  1/1/74    C 0.7
22  1/4/74    C 0.9
23  1/7/74    C 0.4
24 1/10/74    C 0.3

I want to calculate the rolling std. deviation of ei grouped by comp. I want the rolling standard deviation of the last 8 lines - but if only 6 lines exists, so far, it should still take the rolling std. deviation of those. So I use width = 8 and partial = 6 in this code:

roll <- function(z) rollapplyr(z, width = 8, FUN = sd, fill = NA, partial = 6)  
df <- transform(df, roll = ave(ei, comp, FUN = roll)) 

However, due to the fact that some of my 'ei' values are 'NA' the partial part of the function doesn't work, since there is an NA in one of the past 8 lines. So of course after 6 lines the std. dev. is NA. Only for comp = B, the partial = 6 works. The results are seen below:

      date comp  ei      roll
1   1/1/73    A  NA        NA
2   1/4/73    A 0.6        NA
3   1/7/73    A 0.7        NA
4  1/10/73    A 0.9        NA
5   1/1/74    A 0.4        NA
6   1/4/74    A 0.5        NA
7   1/7/74    A 0.7        NA
8  1/10/74    A 0.7        NA
9   1/1/75    A 0.4 0.1726888
10  1/4/75    A 0.5 0.1772811
11  1/1/73    B 0.8        NA
12  1/4/73    B 0.8        NA
13  1/7/73    B 0.5        NA
14 1/10/73    B 0.6        NA
15  1/1/74    B 0.3        NA
16  1/4/74    B 0.2 0.2503331
17  1/1/73    C  NA        NA
18  1/4/73    C 0.6        NA
19  1/7/73    C 0.4        NA
20 1/10/73    C 0.8        NA
21  1/1/74    C 0.7        NA
22  1/4/74    C 0.9        NA
23  1/7/74    C 0.4        NA
24 1/10/74    C 0.3        NA

I would have rather wanted my results to look as it does below, where the first std. dev is calculated for comp A in line number 7 for the previous 6 values (not NA) and where comp C has a std. dev in line 23 and 24:

      date comp  ei      roll
1   1/1/73    A  NA        NA
2   1/4/73    A 0.6        NA
3   1/7/73    A 0.7        NA
4  1/10/73    A 0.9        NA
5   1/1/74    A 0.4        NA
6   1/4/74    A 0.5        NA
7   1/7/74    A 0.7 0.1751190
8  1/10/74    A 0.7 0.1618347
9   1/1/75    A 0.4 0.1726888
10  1/4/75    A 0.5 0.1772811
11  1/1/73    B 0.8        NA
12  1/4/73    B 0.8        NA
13  1/7/73    B 0.5        NA
14 1/10/73    B 0.6        NA
15  1/1/74    B 0.3        NA
16  1/4/74    B 0.2 0.2503331
17  1/1/73    C  NA        NA
18  1/4/73    C 0.6        NA
19  1/7/73    C 0.4        NA
20 1/10/73    C 0.8        NA
21  1/1/74    C 0.7        NA
22  1/4/74    C 0.9        NA
23  1/7/74    C 0.4 0.2065591
24 1/10/74    C 0.3 0.2267787

How can I do this without running a na.omit code before calculating the rolling std. dev? The reason why I don't want to remove NA's is that I need the lines with comp and dates (plus other columns in my real dataset). Also, removing my NA values might, in my real dataset, lead to removing NA's in the middle of a period so that the rolling std. dev. function won't fit with the dates and my results will be wrong.

Is there a way to deal with this without removing the NA values?


Solution

  • 1) FUN computes sd if there are at least 6 non-NAs and otherwise returns NA. Then proceed as in the question.

    library(zoo)
    
    df$date <- as.Date(df$date, "%d/%m/%y")
    FUN <- function(x) if (length(na.omit(x)) >= 6) sd(x, na.rm = TRUE) else NA
    roll <- function(z) rollapplyr(z, width = 8, FUN = FUN, 
          fill = NA, partial = 6)  
    transform(df, roll = ave(ei, comp, FUN = roll)) 
    

    2) The other possibility is to use na.omit and then merge the result back with the original data frame.

    library(zoo)
    
    df$date <- as.Date(df$date, "%d/%m/%y")
    roll <- function(z) rollapplyr(z, width = 8, FUN = sd, fill = NA, partial = 6)  
    
    df_roll_0 <- transform(na.omit(df), roll = ave(ei, comp, FUN = roll)) 
    df_roll_m <- merge(df, df_roll_0, all = TRUE)
    o <- with(df_roll_m, order(comp, date))
    df_roll <- df_roll_m[o, ]
    

    2a) This could also be expressed using dplyr/tidyr:

    library(dplyr)
    library(tidyr)
    library(zoo)
    df$date <- as.Date(df$date, "%d/%m/%y")
    roll <- function(z) rollapplyr(z, width = 8, FUN = sd, fill = NA, partial = 6)  
    
    df_roll_0 <- df %>%
      drop_na %>%
      group_by(comp) %>%
      mutate(roll = roll(ei)) %>%
      ungroup
    
    df %>%
      left_join(df_roll_0)
    

    Note

    Lines <- "      date comp  ei
    1   1/1/73    A  NA
    2   1/4/73    A 0.6
    3   1/7/73    A 0.7
    4  1/10/73    A 0.9
    5   1/1/74    A 0.4
    6   1/4/74    A 0.5
    7   1/7/74    A 0.7
    8  1/10/74    A 0.7
    9   1/1/75    A 0.4
    10  1/4/75    A 0.5
    11  1/1/73    B 0.8
    12  1/4/73    B 0.8
    13  1/7/73    B 0.5
    14 1/10/73    B 0.6
    15  1/1/74    B 0.3
    16  1/4/74    B 0.2
    17  1/1/73    C  NA
    18  1/4/73    C 0.6
    19  1/7/73    C 0.4
    20 1/10/73    C 0.8
    21  1/1/74    C 0.7
    22  1/4/74    C 0.9
    23  1/7/74    C 0.4
    24 1/10/74    C 0.3"
    df <- read.table(text = Lines)