Search code examples
rbindtibble

Adding specified row for the data missing


I want to insert 'NA' value for specified Plot, time and Date and they are located randomly. I figure out how to do manually using add_row function but the main concern for me is I have tons of data and doing manually doesn't help. My data are in this format.

Plot Date Time Canopyheight
B1 10/22/2019 22 50
B1 10/22/2019 1 80
B1 10/22/2019 4 9

So my in each plot, there are 4 timestamps as 22,1,4 and 6 and sometimes there is missing time stamp like B1 10/22/2019 6 Na. I can add these rows by using the code below

  add_row(agg, Date = '10/21/2019', Plot = 'BG107B2', Time = 22,
          Canopyheight = NA, .before = 1)

but I have several dates and plots where I need to add the row. I have tried the following codes

test <- agg %>%
  mutate(ID2 = as.integer(factor(Plot, levels = unique(.$Plot)))) %>%
  split(f = .$ID2) %>%
  map_if(.p = function(x) unique(x$ID2) != unique(last(.)$ID2),
         ~bind_rows(.x, tibble(Time = unique(.x$Time), Canopyheight = NA,
                               ID2 = unique(.x$ID2)))) %>%
  bind_rows() %>%
  select(-ID2)

But still, I am not able to do it, is there any ways that I can automate this instead of the manual way?

Thanks and have a great day.


Solution

  • We can use complete from tidyr to complete the missing combination of Time for each Plot.

    tidyr::complete(df, Plot, Date, Time = c(22, 1, 4, 6))
    
    #  Plot  Date        Time Canopyheight
    #  <fct> <fct>      <dbl>        <int>
    #1 B1    10/22/2019     1           80
    #2 B1    10/22/2019     4            9
    #3 B1    10/22/2019     6           NA
    #4 B1    10/22/2019    22           50
    #5 B2    10/22/2019     1           NA
    #6 B2    10/22/2019     4            9
    #7 B2    10/22/2019     6           80
    #8 B2    10/22/2019    22           50
    

    data

    Included one more group of Plot for testing the solution.

    df <- structure(list(Plot = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("B1", 
    "B2"), class = "factor"), Date = structure(c(1L, 1L, 1L, 1L, 
    1L, 1L), .Label = "10/22/2019", class = "factor"), Time = c(22L, 
    1L, 4L, 22L, 6L, 4L), Canopyheight = c(50L, 80L, 9L, 50L, 80L, 
    9L)), class = "data.frame", row.names = c(NA, -6L))