I have a dataset of call logs that I need to do an iterative count until a specific value is found.
The idea is answering the question "How many records were entered until this specific value is seen?" In my real data the question would be phrased "How many times did we call each person before they answered their phone?"
My data would have 3 columns:
Here is 6 rows of sample data:
data <- data.frame(PhoneNum = c("1112223333","1112223333","1112223333","1112223334","1112223334","1112223334"),
Answered = c("Voice Mail", "Answering Machine", "Member Answered", "Voice Mail", "Answering Machine", "Member Answered"),
Date = Sys.Date() - 1:6)
What I am trying to do is get an count of how many times each phone number was called before a member answered their phone. Output will hopefully look like this:
data <- data.frame(PhoneNum = c("1112223333","1112223333","1112223333","1112223334","1112223334","1112223334"),
Answered = c("Voice Mail", "Answering Machine", "Member Answered", "Voice Mail", "Answering Machine", "Member Answered"),
Date = Sys.Date() - 1:6,
CallAttempts = c("1", "2", "3", "1", "2", "3")) %>% filter(str_detect(Answered, "Answered"))
however I don't want to lose all the other rows if I can avoid it..
Is the best case scenario to do what I have sampled here and create a new DF with just the 'answered' calls?
Ultimately I am trying to determine 1. How many call attempts were made total, 2. How many of those calls ended with an 'impression' (any way a member would know our company called and they need to call us back.. some of our "answered" values will be 'bad number' or otherwise unable to leave a message for the member), and 3. How many attempts were made to each member before they answered the phone (need to get an average number of attempts per answer)
df of answered calls (and which call # for that phonenum): (I am assuming that Member Answered or Answering Machine should both be counted as answering, based on the having the fragment "Answer", but feel free to define as appropriate.
data |>
arrange(Date) |>
mutate(call_num = row_number(), .by = PhoneNum) |>
filter(str_detect(Answered, "Answer"))
Or all those stats:
data |>
arrange(Date) |>
mutate(call_num = row_number(),
answered = stringr::str_detect(Answered, "Answer"),
num_calls = n(),
impressions = sum(answered),
first_answer = min(call_num[answered]),
.by = PhoneNum)
PhoneNum Answered Date call_num answered num_calls impressions first_answer
1 1112223334 Member Answered 2023-10-06 1 TRUE 3 2 1
2 1112223334 Answering Machine 2023-10-07 2 TRUE 3 2 1
3 1112223334 Voice Mail 2023-10-08 3 FALSE 3 2 1
4 1112223333 Member Answered 2023-10-09 1 TRUE 3 2 1
5 1112223333 Answering Machine 2023-10-10 2 TRUE 3 2 1
6 1112223333 Voice Mail 2023-10-11 3 FALSE 3 2 1