Search code examples
rindexingcounttidyversesummary

Finding the "nth attempt" using Tidyverse in R


I have a dataset that shows the logins by date and unique_session_code for each user that visits the webpage of a bakery. I am trying to write code that allows me to determine if this is the first, second, third and so on... visit of this customer. In other words the "nth_visit".

I am really looking for a solution using the tidyverse package, but if there is no way to do that using tidyverse then all other solutions are just as welcome.

USER LOG_DATE TX_REV
ALEX 30/04/2021 uyjd
MARY 30/04/2021 dn89
DOLLY 30/04/2021 nqmd0
FRANK 29/04/2021 nsj456
MARY 29/04/2021 zlpa2
DOLLY 29/04/2021 s09dm
ALEX 28/04/2021 jds02
MARY 28/04/2021 kqos98
DOLLY 28/04/2021 uiskk0

This would be my desired result:

USER LOG_DATE TX_REV nth_visit
ALEX 28/04/2021 jds02 1
ALEX 30/04/2021 uyjd 2
DOLLY 28/04/2021 uiskk0 1
DOLLY 29/04/2021 s09dm 2
DOLLY 30/04/2021 nqmd0 3
FRANK 29/04/2021 nsj456 1
MARY 28/04/2021 kqos98 1
MARY 29/04/2021 zlpa2 2
MARY 30/04/2021 dn89 3

A data frame that has the "nth_visit" to the website by each attempt of the visitor

What have I done?

I have tried to arrange the date in chronological order and group by ID, and reviewing some documentation I tried to use the function rank_dense but it does not give me the results that I am looking for, not even close.

Code:

df<-tibble::tribble(
  ~USER,    ~LOG_DATE,  ~TX_REV,
  "ALEX",       "30/04/2021",   "uyjd",
  "MARY",       "30/04/2021",   "dn89",
  "DOLLY",  "30/04/2021",   "nqmd0",
  "FRANK",  "29/04/2021",   "nsj456",
  "MARY",       "29/04/2021",   "zlpa2",
  "DOLLY",  "29/04/2021",   "s09dm",
  "ALEX",       "28/04/2021",   "jds02",
  "MARY",       "28/04/2021",   "kqos98",
  "DOLLY",  "28/04/2021",   "uiskk0")

Solution

  • Sorting by date will help, otherwise the visit numbering will be in order of appearance in the data rather than chronological.

    df %>%
      mutate(LOG_DATE_DT = as.Date(LOG_DATE, "%d/%m/%Y")) %>%
      arrange(USER, LOG_DATE_DT) %>%
      group_by(USER) %>%
      mutate(nth_visit = row_number()) %>%
      ungroup()
    
    # A tibble: 9 x 5
      USER  LOG_DATE   TX_REV LOG_DATE_DT nth_visit
      <chr> <chr>      <chr>  <date>          <int>
    1 ALEX  28/04/2021 jds02  2021-04-28          1
    2 ALEX  30/04/2021 uyjd   2021-04-30          2
    3 DOLLY 28/04/2021 uiskk0 2021-04-28          1
    4 DOLLY 29/04/2021 s09dm  2021-04-29          2
    5 DOLLY 30/04/2021 nqmd0  2021-04-30          3
    6 FRANK 29/04/2021 nsj456 2021-04-29          1
    7 MARY  28/04/2021 kqos98 2021-04-28          1
    8 MARY  29/04/2021 zlpa2  2021-04-29          2
    9 MARY  30/04/2021 dn89   2021-04-30          3