Search code examples
rtidyversedata-manipulation

Creating new column, with column's value depending on whether the row's value is redundant with other rows or not


I have panel data here. For a single 'name', I would like to compare a new day's products with all previous days' products. If they are different, then the new created value in its row would be 1, otherwise 0. I roughly know that mutate() is needed, but do not know how to compare all products from day 2 with all from day 1. Any help would be greatly appreciated!

This is an example of what I mean. The product 'a' and 'b' from day 2 are redundant with day 1's, thus the value for 'new_product' is 0. While the value for the 6th row is 1, because product d does not repeat from day 1.

data <- matrix(c('name1', 'name1', 'name1',
                 'name1', 'name1', 'name1',
                 '1', '1', '1', '2', '2', '2',
                 'a', 'b', 'c', 
                 'a', 'b', 'd'), nrow = 6, ncol = 3)

colnames(data) <- c('name', 'day', 'product')
data
#>      name    day product
#> [1,] "name1" "1" "a"    
#> [2,] "name1" "1" "b"    
#> [3,] "name1" "1" "c"    
#> [4,] "name1" "2" "a"    
#> [5,] "name1" "2" "b"    
#> [6,] "name1" "2" "d"


ideal_data <- matrix(c('name1', 'name1', 'name1',
                 'name1', 'name1', 'name1',
                 '1', '1', '1', '2', '2', '2',
                 'a', 'b', 'c', 
                 'a', 'b', 'd',
                 '0', '0', '0', '0', '0', '1'), nrow = 6, ncol = 4)

colnames(ideal_data) <- c('name', 'day', 'product', 'new_product')
ideal_data
#>      name    day product new_product
#> [1,] "name1" "1" "a"     "0"        
#> [2,] "name1" "1" "b"     "0"        
#> [3,] "name1" "1" "c"     "0"        
#> [4,] "name1" "2" "a"     "0"        
#> [5,] "name1" "2" "b"     "0"        
#> [6,] "name1" "2" "d"     "1"

Solution

  • One option would be to check when and whether the product was listed first:

    library(dplyr, warn = FALSE)
    
    data |>
      as.data.frame() |>
      mutate(
        new_product = +(day > 1 & row_number() == 1),
        .by = product
      )
    #>    name day product new_product
    #> 1 name1   1       a           0
    #> 2 name1   1       b           0
    #> 3 name1   1       c           0
    #> 4 name1   2       a           0
    #> 5 name1   2       b           0
    #> 6 name1   2       d           1