I have a dataset that looks like the following:
Call No. Arrival Time Call Length (in hrs) ...
================================================
1 0.01 0.061
2 0.08 0.05
3 0.10 (Busy/Unanswered)
4 0.15 0.42
...
10 1.03 0.36
11 1.09 0.72
...
I want to count the number of phone calls each hour (e.g. number of successful phone calls from arrival times [0, 1), [1, 2), [2, 3), etc.
There are some empty values in the call length column, indicating that the phone line was busy, so the call went unanswered. I basically want to count the nonempty occurrences of the call length and group them by hour by summing them. How can I do this using dataframe operations in R?
Chiming in with a base R solution
dat <- data.frame(`Call No.` = c(1,2,3,4,10,11),
`Arrival Time` = c(0.01,0.08,0.10,0.15,1.03,1.09),
`Call Length (in hrs)` = c(0.61, 0.05, NA, 0.42, 0.36, 0.72),
check.names = F) # to keep the spaces
# filter out NAs
dat2 <- dat[complete.cases(dat),]
# add an hour variable
dat2$hour <- floor(dat2$`Arrival Time`)
# for fun, create a function that takes in a df
count_and_sum <- function(df){
return(data.frame(hour = df$hour[1], # assumes we will pass it dfs with 1 hour only
answered_calls = nrow(df),
total_call_time_hrs = sum(df$`Call Length (in hrs)`)))
}
# use split to separate the data into a list of data.frames by hour
# added a step but might be better to do in one row for memory
splitted <- split(dat2, dat2$hour, drop= T)
# use sapply to apply our function to each element of the splitted list
# and transpose to make the output the right orientation
t(sapply(splitted, count_and_sum))
# hour answered_calls total_call_time_hrs
#0 0 3 1.08
#1 1 2 1.08