Search code examples
rtidyversedata-wrangling

How to populate a zero for NA based on presence of a value in a different row - R


I have a df as produced below. The data frame shows the number of captures and recaptures per site for each visit. However each site and capture type were not visited an equal number of times. So simply converting all NAs to zeros does not make sense. What I want is to convert NAs to zeros for sites with recaptures were capture data is present.

For example site "admin_pond" was visited 3 times with capture type "new", having captures each time. For "admin_pond" capture type "recapture", only visit_3 had recaptures. So for visit_1 and visit_2 for "admin_pond"/"recapture" I would like to populate the NAs with a zero. But still keeping visit_4, visit_5, and visit_6 as NAs for "admin_pond"/"recapture". How would I do this for all sites and recapture observations?

df

data

data <- structure(list(site = c("wood_lab_pond", "phelps_pond", "admin_pond", 
"rv_pond", "admin_pond", "wood_lab_pond", "rv_pond", "tuttle_pond", 
"tuttle_pond", "vorisek_pond", "vorisek_pond", "phelps_pond"), 
    capture_type = c("new", "new", "new", "new", "recapture", 
    "recapture", "recapture", "new", "recapture", "new", "recapture", 
    "recapture"), visit_1 = c(2L, 4L, 9L, 1L, NA, NA, NA, 15L, 
    NA, 14L, NA, NA), visit_2 = c(4L, 3L, 15L, 7L, NA, NA, NA, 
    12L, 10L, 4L, 9L, NA), visit_3 = c(1L, 6L, 11L, 4L, 9L, 2L, 
    1L, 39L, NA, NA, NA, NA), visit_4 = c(NA, NA, NA, 13L, NA, 
    NA, NA, 21L, 10L, NA, NA, NA), visit_5 = c(NA, NA, NA, 27L, 
    NA, NA, 2L, 27L, 2L, NA, NA, NA), visit_6 = c(NA, NA, NA, 
    11L, NA, NA, NA, 19L, 1L, NA, NA, NA)), row.names = c(NA, 
-12L), class = c("tbl_df", "tbl", "data.frame"))

Solution

  • I think it can be easily solved with {tidyverse}. I would do something like this:

    library(tidyverse)
    
    data <- data %>% 
      group_by(site) %>% 
      arrange(site, capture_type) %>% # It is not required
      mutate(across(contains("visit"), 
                    ~ifelse(is.na(.) &
                              !is.na(lag(.)), 0, .)))
    

    This code returns the following dataframe