Search code examples
rdataframedatenazero

How can I add zeros or NAs to each species that does not have a corresponding date in R?


I have a database what looks like this (although it has more species and dates):

Date Species Count
2022/06/01 AMAV 50
2022/06/05 AMAV 20
2022/06/07 SESA 10
2022/06/07 AMAV 8

I want each date to include each species at a count of 0 if there is origninally no species listed for that date. When there is no species listed, it just means that the species was not detected and therefore it should be a count of zero.

So for this example, I would want it to look like:

Date Species Count
2022/06/01 AMAV 50
2022/06/01 SESA 0
2022/06/05 AMAV 20
2022/06/05 SESA 0
2022/06/07 SESA 10
2022/06/07 AMAV 8

So I want all of my species listed for each date.

I'm wondering if anyone knows a way to accomplish this in R. I honestly have no idea how to tackle this problem, so would appreciate any tips!


Solution

  • Mostly a classic problem of filling in a series with missing dates. Create data expanding all values across all dates, and left outer join to this data.

    # All combinations
    full <- expand.grid(Date = unique(df$Date), Species = unique(df$Species))
    # Filled new table
    merge(full, df, by = c('Date', 'Species'), all.x = TRUE)
    #        Date Species Count
    #1 2022/06/01    AMAV    50
    #2 2022/06/01    SESA    NA
    #3 2022/06/05    AMAV    20
    #4 2022/06/05    SESA    NA
    #5 2022/06/07    AMAV     8
    #6 2022/06/07    SESA    10
    

    Afterwards you can replace NA values with zeros' using the standard df$Count[is.na(df$Count)] <- 0 if that is preferred.