Search code examples
rindexingcounttidyversesummary

Finding the "nth attempt" using Tidyverse in R


I have a dataset that shows the logins by date and unique_session_code for each user that visits the webpage of a bakery. I am trying to come a code that allows me to  determine if this is the first, second, third and so on... visit of this customer in other words the "nth_visit" and I am really looking for a solution using the tidyverse package, but if there is no way to do that using tidyverse then all other solutions are just as appreciated. thanks for the help guys!  

USER LOG_DATE TX_REV
ALEX 30/04/2021 uyjd
MARY 30/04/2021 dn89
DOLLY 30/04/2021 nqmd0
FRANK 29/04/2021 nsj456
MARY 29/04/2021 zlpa2
DOLLY 29/04/2021 s09dm
ALEX 28/04/2021 jds02
MARY 28/04/2021 kqos98
DOLLY 28/04/2021 uiskk0

This would be my desired result:

USER LOG_DATE TX_REV nth_visit
ALEX 28/04/2021 jds02 1
ALEX 30/04/2021 uyjd 2
DOLLY 28/04/2021 uiskk0 1
DOLLY 29/04/2021 s09dm 2
DOLLY 30/04/2021 nqmd0 3
FRANK 29/04/2021 nsj456 1
MARY 28/04/2021 kqos98 1
MARY 29/04/2021 zlpa2 2
MARY 30/04/2021 dn89 3

A data frame that has the "nth_visit" to the website by each attempt of the visitor

What have I done?

I have tried to arrange the date in chronological order and group by ID, and reviewing some documentation I tried to use the function rank_dense but it does not give me the results that I am looking for..not even close.

Thank you so much guys

code:

df<-tibble::tribble(
  ~USER,    ~LOG_DATE,  ~TX_REV,
  "ALEX",       "30/04/2021",   "uyjd",
  "MARY",       "30/04/2021",   "dn89",
  "DOLLY",  "30/04/2021",   "nqmd0",
  "FRANK",  "29/04/2021",   "nsj456",
  "MARY",       "29/04/2021",   "zlpa2",
  "DOLLY",  "29/04/2021",   "s09dm",
  "ALEX",       "28/04/2021",   "jds02",
  "MARY",       "28/04/2021",   "kqos98",
  "DOLLY",  "28/04/2021",   "uiskk0")

Solution

  • Sorting by date will help, otherwise the visit numbering will be in order of appearance in the data rather than chronological.

    df %>%
      mutate(LOG_DATE_DT = as.Date(LOG_DATE, "%d/%m/%Y")) %>%
      arrange(USER, LOG_DATE_DT) %>%
      group_by(USER) %>%
      mutate(nth_visit = row_number()) %>%
      ungroup()
    
    # A tibble: 9 x 5
      USER  LOG_DATE   TX_REV LOG_DATE_DT nth_visit
      <chr> <chr>      <chr>  <date>          <int>
    1 ALEX  28/04/2021 jds02  2021-04-28          1
    2 ALEX  30/04/2021 uyjd   2021-04-30          2
    3 DOLLY 28/04/2021 uiskk0 2021-04-28          1
    4 DOLLY 29/04/2021 s09dm  2021-04-29          2
    5 DOLLY 30/04/2021 nqmd0  2021-04-30          3
    6 FRANK 29/04/2021 nsj456 2021-04-29          1
    7 MARY  28/04/2021 kqos98 2021-04-28          1
    8 MARY  29/04/2021 zlpa2  2021-04-29          2
    9 MARY  30/04/2021 dn89   2021-04-30          3