Search code examples
rstringdplyrsum

Count a string & sum them in a new column in R using dplyr?


I have a dataset with different types of observations across several "transects". Still pretty new to R, and struggling with the below issue...

I need to calculate the number of "nest" observations in each transect, but I am getting an error that makes me think maybe I am not using the correct function? In the end, I want to create a new column called "nest_number" which has the sum of the number of observations equal to nest.

The data is in this format:

transect observation
1A nest
1A NA
1A nest
1A vocalization
1A NA
2A nest
2A NA
... ...

Here is how I need the output to look:

transect observation nest_number
1A nest 2
1A NA 2
1A nest 2
1A vocalization 2
1A NA 2
2A nest 1
2A NA 1
... ... ...

Here is the code I used

dfNew <- df %>%
  group_by(transect) %>%
  mutate(number_nests = colSums(observation == "nest", na.rm = TRUE))

The error I get is:

'x' must be an array of at least two dimensions The error occurred in group 1: transect = "1A".


Solution

  • It should be sum and not colSums because colSums expect a data.frame/matrix, but here we are doing the sum on a logical vector (observation == "nest")

    library(dplyr)
    df %>% 
      group_by(transect) %>% 
      mutate(nest_number = sum(observation == "nest", na.rm = TRUE)) %>%
      ungroup
    

    -output

    # A tibble: 7 × 3
      transect observation  nest_number
      <chr>    <chr>              <int>
    1 1A       nest                   2
    2 1A       <NA>                   2
    3 1A       nest                   2
    4 1A       vocalization           2
    5 1A       <NA>                   2
    6 2A       nest                   1
    7 2A       <NA>                   1