Search code examples
rdplyriterationtidyr

Tidyr/Dplyr: Iterative counts until a specific value is found. "How many records were added before value X was entered?"


I have a dataset of call logs that I need to do an iterative count until a specific value is found.

The idea is answering the question "How many records were entered until this specific value is seen?" In my real data the question would be phrased "How many times did we call each person before they answered their phone?"

My data would have 3 columns:

  1. Phone Number called
  2. Date of call
  3. Did person answer? (this column is a text column answers could be YES, NO, Answering Machine, VoiceMail, etc)

Here is 6 rows of sample data:

 data <- data.frame(PhoneNum = c("1112223333","1112223333","1112223333","1112223334","1112223334","1112223334"),
                   Answered = c("Voice Mail", "Answering Machine", "Member Answered", "Voice Mail", "Answering Machine", "Member Answered"),
                   Date = Sys.Date() - 1:6)

What I am trying to do is get an count of how many times each phone number was called before a member answered their phone. Output will hopefully look like this:

data <- data.frame(PhoneNum = c("1112223333","1112223333","1112223333","1112223334","1112223334","1112223334"),
                   Answered = c("Voice Mail", "Answering Machine", "Member Answered", "Voice Mail", "Answering Machine", "Member Answered"),
                   Date = Sys.Date() - 1:6,
                   CallAttempts = c("1", "2", "3", "1", "2", "3")) %>% filter(str_detect(Answered, "Answered"))

however I don't want to lose all the other rows if I can avoid it..

Is the best case scenario to do what I have sampled here and create a new DF with just the 'answered' calls?

Ultimately I am trying to determine 1. How many call attempts were made total, 2. How many of those calls ended with an 'impression' (any way a member would know our company called and they need to call us back.. some of our "answered" values will be 'bad number' or otherwise unable to leave a message for the member), and 3. How many attempts were made to each member before they answered the phone (need to get an average number of attempts per answer)


Solution

  • df of answered calls (and which call # for that phonenum): (I am assuming that Member Answered or Answering Machine should both be counted as answering, based on the having the fragment "Answer", but feel free to define as appropriate.

    data |>
      arrange(Date) |>
      mutate(call_num = row_number(), .by = PhoneNum) |>
      filter(str_detect(Answered, "Answer"))
    

    Or all those stats:

    data |>
      arrange(Date) |>
      mutate(call_num = row_number(), 
             answered = stringr::str_detect(Answered, "Answer"),
             num_calls = n(),
             impressions = sum(answered),
             first_answer = min(call_num[answered]),
             .by = PhoneNum)
    
        PhoneNum          Answered       Date call_num answered num_calls impressions first_answer
    1 1112223334   Member Answered 2023-10-06        1     TRUE         3           2            1
    2 1112223334 Answering Machine 2023-10-07        2     TRUE         3           2            1
    3 1112223334        Voice Mail 2023-10-08        3    FALSE         3           2            1
    4 1112223333   Member Answered 2023-10-09        1     TRUE         3           2            1
    5 1112223333 Answering Machine 2023-10-10        2     TRUE         3           2            1
    6 1112223333        Voice Mail 2023-10-11        3    FALSE         3           2            1