Search code examples
rdataframedata-sciencepreprocessor

Add a new row on basis of column values in R


I am trying to get my head around this simple preprocessing task in R. I am trying to get the ideal value column as a row titled ideal in Product ID. I think the image below will shed more light on it.

enter image description here

> dput(df)

structure(list(Consumer = c(43L, 43L, 43L, 43L, 43L, 41L, 41L, 
41L, 41L, 41L), Product = c(106L, 992L, 366L, 257L, 548L, 106L, 
992L, 366L, 257L, 548L), Firm = c(1L, 1L, 1L, 1L, 1L, 0L, 0L, 
0L, 0L, 0L), Juicy = c(1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L
), Sweet = c(0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L), Ideal_Firm = c(1L, 
1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L), Ideal_Juicy = c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Ideal_Sweet = c(1L, 1L, 1L, 
1L, 1L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-10L))

Solution

  • Below is a solution:

    df <- data.frame(
      Consumer = c(rep(43, 5), rep(41, 5)),
      Product = rep(sample(100:900,size = 5, replace = F), 2),
      Firm = c(sample(rep(0:1, 5), replace = T)),
      Juicy = c(sample(rep(0:1, 5), replace = T)),
      Sweet = c(sample(rep(0:1, 5), replace = T)),
      Ideal_Firm = 1, 
      Ideal_Juicy = c(rep(1, 5), rep(2, 5)), 
      Ideal_Sweet = c(rep(1, 5), rep(0, 5))
    )
    
    library(dplyr)
    df <- merge(
      # Bind the observation...
      df %>% select(Consumer:Sweet) %>% 
        pivot_wider(id_cols = Consumer,names_from = Product,values_from = Firm:Sweet),
      # ... to the ideal
      df %>% group_by(Consumer) %>% 
        # Here I put mean, but it could be 1, median, min, max... If I understood correctly, it has to be 1?
        summarise(across(Ideal_Firm:Ideal_Sweet, ~mean(.x))) %>%
        # Rename so the column name has the form [characteristic]_ideal instead of Ideal_[characteristic]
        # remove prefix Ideal_ ...
        rename_at(.vars = vars(starts_with("Ideal_")),
                  .funs = funs(sub("Ideal_", "", .))) %>%
        # ... add _Ideal as a suffix instead
        rename_at(vars(-Consumer), function(x) paste0(x,"_Ideal"))
    )
    
    # Then manipulate to get into long form again
    df <- df %>% pivot_longer(cols = !Consumer) %>%
      separate(name, c("Characteristic", "Product")) %>%
      pivot_wider(id_cols = Consumer:Product, names_from = Characteristic, values_from = value)
    df