Really simple question, but somehow i am stuck. I have panel data of users daily tasks. Now i want to find out how many tasks one user does on average, but somehow i have no idea how. And how long one user on average takes per task. Also, i would like to plot this data if possible. I did the normal descriptives, but i feel like it is not exactly what i need. The data looks somewhat like this user (1, 1, 1, 2, 2,3) task( 1, 1,2, 3,4, 5) day( 1, 2, 1,1,2,1) task creation (1,1,1,4,4,3) deadline(5,5,5,9,9,4)
id_task id_user day completion_yesno day_created has_deadline deadline created_before active overdue completed_before
16416 37033 5272 61 0 61 1 172 0 0 0 0
16417 37033 5272 62 0 61 1 172 2 2 0 0
16418 37033 5272 63 0 61 1 172 2 2 0 0
16419 37033 5272 64 0 61 1 172 2 2 0 0
16420 37033 5272 65 0 61 1 172 2 2 0 0
16421 37033 5272 66 0 61 1 172 2 2 0 0
16422 37033 5272 67 0 61 1 172 2 2 0 0
16423 37033 5272 68 0 61 1 172 2 2 0 0
16424 37033 5272 69 0 61 1 172 2 2 0 0
16425 37033 5272 70 0 61 1 172 2 2 0 0
16426 37033 5272 71 0 61 1 172 2 2 0 0
16427 37033 5272 72 0 61 1 172 2 2 0 0
16428 37033 5272 73 0 61 1 172 2 2 0 0
16429 37033 5272 74 0 61 1 172 2 2 0 0
16430 37033 5272 75 0 61 1 172 2 2 0 0
16431 37033 5272 76 0 61 1 172 2 2 0 0
16432 37033 5272 77 0 61 1 172 2 2 0 0
16433 37033 5272 78 0 61 1 172 2 2 0 0
16434 37033 5272 79 0 61 1 172 2 2 0 0
16435 37033 5272 80 0 61 1 172 2 2 0 0
In this case one user would work on 2 tasks on average, but i just found it out through counting.
Keep only information on user, task and completed. Remove duplicated lines, then group by user and compute the number of completed tasks for each user:
df_by_user <- df %>%
select(id_user, id_task, completion_yesno) %>%
unique() %>%
group_by(id_user) %>%
summarise(n = sum(completion_yesno))
Then compute the average:
mean(df_by_user$n)