Search code examples
rdplyrgroup-bybinary

Binary to identify the first ever observation by group and keep the value after this period in R


I have a data frame like df:

df <- data.frame(year = c("2000", "2000", "2000", "2002", "2007", "2001", "2002", "2003", "2007"), 
                 id = c("X", "X", "X", "X", "X", "Z", "Z", "Z", "Z"), 
                 product = c("apple","orange", "melon", "orange", "orange", "orange", "orange", "orange", "truffels"), 
                 market = c("CHN", "USA", "USA", "CAN", "USA", "USA", "USA", "USA", "ECU"), 
                 value = c(1, 2, 3, 4, 5, 6, 7, 8, 9))

I need to build a variable core (group by id) that takes value 1 if a product(s) coincides with a first-ever observed for an id, like in df_new:

df_new <- data.frame(year = c("2000", "2000", "2000", "2002", "2007", "2001", "2002", "2003", "2007"), 
                 id = c("X", "X", "X", "X", "X", "Z", "Z", "Z", "Z"), 
                 product = c("apple","orange", "melon", "orange", "orange", "orange", "orange", "orange", "truffels"), 
                 market = c("CHN", "USA", "USA", "CAN", "USA", "USA", "USA", "USA", "ECU"), 
                 value = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
                 core= c(1, 1, 1, 1, 1, 1, 1, 1, 0)
)

I have tried df_new <- df %>% group_by(product, id) %>% mutate(core = +(year == min(year))), but it only gives a binary variable that takes value 1 the first time an id observes a given product (at different years).

Any idea using dplyr?


Solution

  • You'd need to group by id only, and then check if the product exist among all the products with the lowest year (if I understand you correctly). I.e.

    library(dplyr)
    
    df |>
      group_by(id) |>
      mutate(FP = +(product %in% product[year == min(year)])) |>
      ungroup()
    

    Output:

    # A tibble: 9 × 6
      year  id    product  market value    FP
      <chr> <chr> <chr>    <chr>  <dbl> <int>
    1 2000  X     apple    CHN        1     1
    2 2000  X     orange   USA        2     1
    3 2000  X     melon    USA        3     1
    4 2002  X     orange   CAN        4     1
    5 2007  X     orange   USA        5     1
    6 2001  Z     orange   USA        6     1
    7 2002  Z     orange   USA        7     1
    8 2003  Z     orange   USA        8     1
    9 2007  Z     truffels ECU        9     0