Search code examples
rdplyrcountingfactorslevels

Counting new levels of a factor per site (R)


I need to generate a table counting new levels of a factor per site. My code is like this

# Data creation
f = c("red", "green", "blue", "orange", "yellow")
f = factor(f)
d = data.frame(
  site = 1:10,
  color1= c(
    "red", "red", "green", "green",  "green", 
    "blue","green", "blue", "orange", "yellow"
  ),
  color2= c(
    "green", "green",  "green", "blue","green", 
    "blue", "orange", "yellow","red", "red"
  )
)
d$color1 = factor( d$color1 , levels = levels(f) )
d$color2 = factor( d$color2 , levels = levels(f) )
d

It shows me this table

my data

I need to count how many new colors are in every new site. Only count first time appearing, not duplicated. Resulting a table like this one.

looking for this output

Counting not duplicated colors per site is in this figure. Counting

Is there a dplyr way to find this output?


Solution

  • You can do:

    library(tidyverse)
    d %>%
      pivot_longer(cols = -site) %>%
      mutate(newColors = duplicated(value)) %>%
      group_by(site) %>%
      mutate(newColors = sum(!newColors)) %>%
      ungroup() %>%
      pivot_wider()
    

    which gives:

    # A tibble: 10 x 4
        site newColors color1 color2
       <int>     <int> <fct>  <fct> 
     1     1         2 red    green 
     2     2         0 red    green 
     3     3         0 green  green 
     4     4         1 green  blue  
     5     5         0 green  green 
     6     6         0 blue   blue  
     7     7         1 green  orange
     8     8         1 blue   yellow
     9     9         0 orange red   
    10    10         0 yellow red   
    

    Note that this differs for row 9 where you have a 1, but both colors (orange and red) already appeared in previous rows.