I have a data frame like df:
df <- data.frame(year = c("2000", "2000", "2000", "2002", "2007", "2001", "2002", "2003", "2007"),
id = c("X", "X", "X", "X", "X", "Z", "Z", "Z", "Z"),
product = c("apple","orange", "melon", "orange", "orange", "orange", "orange", "orange", "truffels"),
market = c("CHN", "USA", "USA", "CAN", "USA", "USA", "USA", "USA", "ECU"),
value = c(1, 2, 3, 4, 5, 6, 7, 8, 9))
I need to build a variable core (group by id) that takes value 1 if a product(s) coincides with a first-ever observed for an id, like in df_new
:
df_new <- data.frame(year = c("2000", "2000", "2000", "2002", "2007", "2001", "2002", "2003", "2007"),
id = c("X", "X", "X", "X", "X", "Z", "Z", "Z", "Z"),
product = c("apple","orange", "melon", "orange", "orange", "orange", "orange", "orange", "truffels"),
market = c("CHN", "USA", "USA", "CAN", "USA", "USA", "USA", "USA", "ECU"),
value = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
core= c(1, 1, 1, 1, 1, 1, 1, 1, 0)
)
I have tried df_new <- df %>% group_by(product, id) %>% mutate(core = +(year == min(year)))
, but it only gives a binary variable that takes value 1 the first time an id observes a given product (at different years).
Any idea using dplyr?
You'd need to group by id only, and then check if the product exist among all the products with the lowest year (if I understand you correctly). I.e.
library(dplyr)
df |>
group_by(id) |>
mutate(FP = +(product %in% product[year == min(year)])) |>
ungroup()
Output:
# A tibble: 9 × 6
year id product market value FP
<chr> <chr> <chr> <chr> <dbl> <int>
1 2000 X apple CHN 1 1
2 2000 X orange USA 2 1
3 2000 X melon USA 3 1
4 2002 X orange CAN 4 1
5 2007 X orange USA 5 1
6 2001 Z orange USA 6 1
7 2002 Z orange USA 7 1
8 2003 Z orange USA 8 1
9 2007 Z truffels ECU 9 0