Search code examples
rgroup-bycountsummarize

How to count number of unique values in a categorical variable, by groups of multiple columns


I am having a terrible time figuring out this script. I am working with camera trap data, and have variables for camera name, species, and time of day species was captured on camera (day or night). I want to know for each camera, how many times each species occurred during the day. Ultimately I am looking for a variable, perc.day, of the percent of occurrences during the day for each species for each camera. perc.day would be # of "day" detections / all total detections for each species on each camera. I have been using dplyr and data.table.

Here is a sample of what the data looks like:

camera <- c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C')
species <- c('deer', 'deer', 'deer', 'coyote', 'deer', 'deer', 'deer', 'deer', 'coyote', 'coyote', 'coyote', 'bobcat')
time_cat <- c('day', 'day', 'night', 'day', 'night', 'day', 'day', 'day', 'night', 'day', 'night', 'night')

data <- data.frame(camera, species, time_cat)

And what I need it to look like:

camera <- c('A', 'A', 'B', 'C', 'C', 'C')
species <- c('deer', 'coyote', 'deer', 'deer', 'coyote', 'bobcat')
total.detections <- c('3', '1', '3', '1', '3', '1')
perc.day <- c('0.667', '0', '0.667', '100', '0.333', '0')

data.final <- data.frame(camera, species, total.detections, perc.day)

This code works up to the detections variable, but at a loss for how to get the perc.day variable

data.final <- 
  data %>%
  group_by(camera, species) %>%
  summarize(total.detections = n())

Thank you for any help or advice.


Solution

  • A final mutate and ungroup on top of your original solution does the trick:

    data %>%
      group_by(camera, species) %>%
      summarize(total.detections = n(),
                day_count = sum(time_cat == "day")) %>%
      ungroup() %>%
      mutate(per.day = day_count / total.detections)
    
    # # A tibble: 6 × 5
    #   camera species total.detections day_count per.day
    #   <chr>  <chr>              <int>     <int>   <dbl>
    # 1 A      coyote                 1         1   1    
    # 2 A      deer                   3         2   0.667
    # 3 B      deer                   3         2   0.667
    # 4 C      bobcat                 1         0   0    
    # 5 C      coyote                 3         1   0.333
    # 6 C      deer                   1         1   1