Search code examples
rdata-cleaningdata-wrangling

Yearly percent change of group members in r


I want to see the attrition/growth level of groups' members by group in R.

My data:

year1 <- 
  tibble(people = c("Joe A", "Max X", "Sam M",  "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2))

year1 <- 
  tibble(people = c("Joe A", "Sam M",  "Jane K", "Doug K", "Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2))
  • Group 1 lost Max but gained Jane that moved from group 2.
  • Group 2 lost Jane but gained Mohamad

Is there a way to see how many people joined/left a group in each year and the percentage change from year to year?


Solution

  • Maybe there are easier options, but you could do:

    year1 <- tibble(people = c("Joe A", "Max X", "Sam M", "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2))
    
    year2 <- tibble(people = c("Joe A", "Sam M", "Jane K", "Doug K", "Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2))
    
    library(tidyverse)    
    map(.x = unique(year1$group),
        .f = ~ year1 |> 
          filter(group == .x) |> 
          mutate(year = 1) |> 
          bind_rows(year2 |> 
                      filter(group == .x) |> 
                      mutate(year = 2)) |> 
          summarize(group = unique(group),
                    joined     = length(setdiff(people[year == 2], people[year == 1])),
                    left       = length(setdiff(people[year == 1], people[year == 2])),
                    n_year1    = sum(year == 1),
                    n_year2    = sum(year == 2),
                    pct_change = n_year1 / n_year2)) |> 
      bind_rows()
    
    # A tibble: 2 × 6
      group joined  left n_year1 n_year2 pct_change
      <dbl>  <int> <int>   <int>   <int>      <dbl>
    1     1      1     1       3       3        1  
    2     2      3     1       2       4        0.5