I have panel data here. For a single 'name', I would like to compare a new day's products with all previous days' products. If they are different, then the new created value in its row would be 1, otherwise 0. I roughly know that mutate() is needed, but do not know how to compare all products from day 2 with all from day 1. Any help would be greatly appreciated!
This is an example of what I mean. The product 'a' and 'b' from day 2 are redundant with day 1's, thus the value for 'new_product' is 0. While the value for the 6th row is 1, because product d does not repeat from day 1.
data <- matrix(c('name1', 'name1', 'name1',
'name1', 'name1', 'name1',
'1', '1', '1', '2', '2', '2',
'a', 'b', 'c',
'a', 'b', 'd'), nrow = 6, ncol = 3)
colnames(data) <- c('name', 'day', 'product')
data
#> name day product
#> [1,] "name1" "1" "a"
#> [2,] "name1" "1" "b"
#> [3,] "name1" "1" "c"
#> [4,] "name1" "2" "a"
#> [5,] "name1" "2" "b"
#> [6,] "name1" "2" "d"
ideal_data <- matrix(c('name1', 'name1', 'name1',
'name1', 'name1', 'name1',
'1', '1', '1', '2', '2', '2',
'a', 'b', 'c',
'a', 'b', 'd',
'0', '0', '0', '0', '0', '1'), nrow = 6, ncol = 4)
colnames(ideal_data) <- c('name', 'day', 'product', 'new_product')
ideal_data
#> name day product new_product
#> [1,] "name1" "1" "a" "0"
#> [2,] "name1" "1" "b" "0"
#> [3,] "name1" "1" "c" "0"
#> [4,] "name1" "2" "a" "0"
#> [5,] "name1" "2" "b" "0"
#> [6,] "name1" "2" "d" "1"
One option would be to check when and whether the product was listed first:
library(dplyr, warn = FALSE)
data |>
as.data.frame() |>
mutate(
new_product = +(day > 1 & row_number() == 1),
.by = product
)
#> name day product new_product
#> 1 name1 1 a 0
#> 2 name1 1 b 0
#> 3 name1 1 c 0
#> 4 name1 2 a 0
#> 5 name1 2 b 0
#> 6 name1 2 d 1