I have a set of events that occur in the morning and in the afternoon, and would like to calculate the probability of each occurring in the morning vs the afternoon.
i.e -
P = number of outcomes/total number of potential outcomes
For example the number of Aggression events in the table below that occurred in the morning vs the afternoon would be:
p morning = 3468/3468+4658 = 0.4678
p afternoon = 1 - p morning = 1 - 0.4678 = 0.5322
Event_Time Event_Name Num_of_Occurances
Morning Aggression 3468
Afternoon Aggression 4658
Morning SIB 900
Afternoon SIB 1500
Morning Elopement 400
Afternoon Elopement 234
Morning Pica 786
Afternoon Pica 1234
Morning Stereotypy 234
Afternoon Stereotypy 633
Morning Disruptive 534
Afternoon Disruptive 780
I'm trying to find the best way to do this in R, I know I could pivot the table wide and add a column with the calculation though I'm wondering if prop.table
or another function can handle more efficiently.
You can create a small function to make the calculations, and apply it by group:
library(data.table)
f <- \(e,t) {
pm = e[t=="Morning"]/sum(e)
return(list(p_morning=pm, p_afternoon=1-pm))
}
setDT(dt)[, f(Num_of_Occurances,Event_Time), Event_Name ]
Output:
Event_Name p_morning p_afternoon
1: Aggression 0.4267782 0.5732218
2: SIB 0.3750000 0.6250000
3: Elopement 0.6309148 0.3690852
4: Pica 0.3891089 0.6108911
5: Stereotypy 0.2698962 0.7301038
6: Disruptive 0.4063927 0.5936073
Input:
structure(list(Event_Time = c("Morning", "Afternoon", "Morning",
"Afternoon", "Morning", "Afternoon", "Morning", "Afternoon",
"Morning", "Afternoon", "Morning", "Afternoon"), Event_Name = c("Aggression",
"Aggression", "SIB", "SIB", "Elopement", "Elopement", "Pica",
"Pica", "Stereotypy", "Stereotypy", "Disruptive", "Disruptive"
), Num_of_Occurances = c(3468L, 4658L, 900L, 1500L, 400L, 234L,
786L, 1234L, 234L, 633L, 534L, 780L)), row.names = c(NA, -12L
), class = "data.frame")
Of course, you don't need the function.Here is an alternative without a helper function, and this time illustrating dplyr, instead of data.table
library(dplyr)
reframe(dt, p_morning=Num_of_Occurances[Event_Time=="Morning"]/sum(Num_of_Occurances), .by=Event_Name) %>%
mutate(p_afternoon = 1-p_morning)
Output:
Event_Name p_morning p_afternoon
1 Aggression 0.4267782 0.5732218
2 SIB 0.3750000 0.6250000
3 Elopement 0.6309148 0.3690852
4 Pica 0.3891089 0.6108911
5 Stereotypy 0.2698962 0.7301038
6 Disruptive 0.4063927 0.5936073