Search code examples
rnazoolocf

na.locf remove leading NAs, keep others


I have a question regarding the na.locf function in the zoo package. Within the data frame below I want to remove the leading NAs (for years 1987, 1988) but keep those with a valid value for the previous year (1993).

Year     X
1987     NA
1988     NA
1989     2
1990     5
1991     9
1992     16
1993     NA
1994     27
1995     36

Does anyone have a solution for this problem?


Solution

  • The na.locf is designed for filling missing observations, not removing them. The zoo package also has a na.trim function which removes leading and/or trailing observations:

    na.trim(mydf)
    

    which gives:

    > na.trim(mydf)
      Year  X
    3 1989  2
    4 1990  5
    5 1991  9
    6 1992 16
    7 1993 NA
    8 1994 27
    9 1995 36
    

    With the sides parameter you can choose whether to remove only leading or trailing missing observations or both. Using for example sides = 'right' will only remove trailing missing observations and keep the leading missing observations:

    > na.trim(mydf, sides = 'right')
      Year  X
    1 1987 NA
    2 1988 NA
    3 1989  2
    4 1990  5
    5 1991  9
    6 1992 16
    7 1993 NA
    8 1994 27
    9 1995 36
    

    Consequently, using sides = 'left' will only remove leading missing observations and keep the trailing missing observations:

    > na.trim(mydf, sides = 'left')
       Year  X
    3  1989  2
    4  1990  5
    5  1991  9
    6  1992 16
    7  1993 NA
    8  1994 27
    9  1995 36
    10 1996 NA
    

    Used data:

    mydf <- structure(list(Year = 1987:1996, X = c(NA, NA, 2L, 5L, 9L, 16L, NA, 27L, 36L, NA)),
                      .Names = c("Year", "X"), class = "data.frame", row.names = c(NA,-10L))