Search code examples
rdataframegroupingdata-cleaning

R: Calculating the Number of Phone Calls Every Hour


I have a dataset that looks like the following:

Call No.  Arrival Time  Call Length (in hrs) ...
================================================
1         0.01          0.061 
2         0.08          0.05
3         0.10                                   (Busy/Unanswered)
4         0.15          0.42
...
10        1.03          0.36
11        1.09          0.72
...

I want to count the number of phone calls each hour (e.g. number of successful phone calls from arrival times [0, 1), [1, 2), [2, 3), etc.

There are some empty values in the call length column, indicating that the phone line was busy, so the call went unanswered. I basically want to count the nonempty occurrences of the call length and group them by hour by summing them. How can I do this using dataframe operations in R?


Solution

  • Chiming in with a base R solution

    dat <- data.frame(`Call No.` = c(1,2,3,4,10,11), 
               `Arrival Time` = c(0.01,0.08,0.10,0.15,1.03,1.09),
               `Call Length (in hrs)` = c(0.61, 0.05, NA, 0.42, 0.36, 0.72), 
               check.names = F) # to keep the spaces
    # filter out NAs
    dat2 <- dat[complete.cases(dat),]
    # add an hour variable
    dat2$hour <- floor(dat2$`Arrival Time`)
    
    # for fun, create a function that takes in a df
    count_and_sum <- function(df){
      return(data.frame(hour = df$hour[1], # assumes we will pass it dfs with 1 hour only
                 answered_calls =   nrow(df),
                 total_call_time_hrs = sum(df$`Call Length (in hrs)`)))
    }
    
    # use split to separate the data into a list of data.frames by hour 
    # added a step but might be better to do in one row for memory
    splitted <- split(dat2, dat2$hour, drop= T) 
    # use sapply to apply our function to each element of the splitted list
    # and transpose to make the output the right orientation
    t(sapply(splitted, count_and_sum))
    #  hour answered_calls total_call_time_hrs
    #0 0    3              1.08               
    #1 1    2              1.08