Search code examples
rduplicatestidyversepivot-tabledummy-variable

Widen the dataset in Rstudio with tidyverse


I have a large dataset about the order data and some customers repurchased it while some didn't do it.

The simple sample dataset is as follows.

data_sample <- c(rep("JAck", 3), "Ann", rep("Alice", 2), "Mark", "Grace")
time_sample <- c("2018-10-03 19:51:51",
                 "2018-10-05 19:55:15",
                 "2018-11-19 06:26:02",
                 "2019-01-06 15:24:30",
                 "2018-10-01 15:15:43",
                 "2018-10-05 11:12:54",
                 "2019-01-27 00:49:19",
                 "2018-10-03 10:10:34")

dat_sample <- cbind(data_sample, time_sample)

enter image description here

It means Jack has ordered the item for three times and he repurchased; Alice has ordered it twice; Ann, Mark and Grace only ordered it once and they didn't repurchase. Therefore, how can I mutate a new variable, say, whether_to_purcahse for these customers? 1 means repurchase and 0 means no repurchase.

Since I want to transfer the dat_sample into the data frame in excel, shown below,how can I manipulate it in r with tidyverse? I know I need to first ensure whether this person repurchased or not and then I need to know how many times the person has ordered totally if he made any repurchase. Finally, I need to make the long dateset into a wider dataset. However, I have some problem in implementing these steps above. Any suggestions or help?

Thanks so much.

enter image description here


Solution

  • dat_sample %>%
      group_by(data_sample) %>%
      arrange(time_sample) %>%     # unnecessary if sorted already
      mutate(instance = row_number()) %>%
      ungroup() %>%
      pivot_wider(names_from = instance, values_from = time_sample) %>%
      mutate(reordered = 1 * !is.na(`2`))  # 0 if 2 NA; 1 if not