I am trying to remake this figure with my data.
I have 2 binary factors (P treatment & Embryo_Presence_vs._Absence) and 1 continuous value (Final_Size). Here is my data:
print.data.frame(x[c("P_Treatment", "Embryo_Presence_vs._Absence", "Final_Size")])
P_Treatment Embryo_Presence_vs._Absence Final_Size
1 Intermediate 0 3.80
2 Intermediate 0 3.76
3 Intermediate 0 3.70
4 Intermediate 0 3.73
5 Intermediate 0 3.10
6 Intermediate 0 3.77
7 Intermediate 0 3.59
8 Intermediate 0 3.73
9 Intermediate 0 3.36
10 Intermediate 0 3.81
11 Intermediate 0 3.72
12 Intermediate 0 3.92
13 Intermediate 0 3.56
14 Intermediate 0 3.78
15 Intermediate 0 3.63
16 Intermediate 0 3.27
17 Intermediate 0 3.60
18 Intermediate 0 3.74
19 Intermediate 0 3.60
20 Intermediate 0 3.65
21 Intermediate 0 3.59
22 Intermediate 0 3.75
23 Intermediate 0 3.78
24 Intermediate 0 3.55
25 Intermediate 0 3.65
26 Intermediate 0 3.65
27 Low 0 3.91
28 Low 0 3.72
29 Low 0 3.77
30 Low 0 3.73
31 Low 0 3.65
32 Low 0 3.57
33 Low 0 3.82
34 Low 0 3.76
35 Low 0 3.88
36 Low 0 4.10
37 Low 0 3.71
38 Low 0 3.57
39 Low 0 3.77
40 Low 0 3.60
41 Low 0 3.70
42 Low 0 3.55
43 Low 0 4.00
44 Low 0 3.56
45 Low 0 3.71
46 Low 0 3.61
47 Low 0 3.63
48 Low 0 3.72
49 Low 0 3.80
50 Low 0 3.86
51 Low 0 3.08
52 Low 0 3.81
53 Low 0 3.73
54 Low 0 3.84
55 Low 0 3.76
56 Low 0 3.66
57 Low 0 3.70
58 Low 0 3.71
59 Low 0 3.60
60 Low 0 3.75
61 Low 0 3.74
62 Intermediate 1 3.89
63 Low 1 3.87
64 Intermediate 1 3.99
65 Intermediate 1 3.93
66 Intermediate 1 3.65
67 Intermediate 1 3.64
68 Intermediate 1 3.81
69 Low 1 3.67
70 Low 1 3.52
71 Intermediate 1 3.70
72 Low 1 3.83
73 Low 1 3.65
74 Low 1 3.94
75 Intermediate 1 3.75
76 Low 1 3.71
77 Intermediate 1 3.56
78 Intermediate 1 3.89
79 Low 1 3.72
80 Low 1 3.25
81 Intermediate 1 3.79
82 Intermediate 1 3.60
83 Intermediate 1 3.88
84 Intermediate 1 3.75
85 Intermediate 1 3.75
86 Low 1 3.58
87 Intermediate 1 3.75
88 Intermediate 1 3.65
89 Low 1 3.60
90 Intermediate 1 3.68
91 Low 1 3.65
92 Intermediate 1 3.88
93 Intermediate 1 3.77
94 Intermediate 1 3.63
95 Low 1 3.68
96 Intermediate 1 3.83
97 Intermediate 1 3.85
98 Low 1 3.81
99 Intermediate 1 3.56
100 Intermediate 1 3.70
101 Low 1 3.74
102 Low 1 3.65
103 Intermediate 1 3.69
104 Intermediate 1 3.89
105 Intermediate 1 3.70
106 Intermediate 1 3.70
107 Intermediate 1 3.76
108 Intermediate 1 3.66
109 Intermediate 1 3.78
110 Intermediate 1 4.22
111 Intermediate 1 3.68
112 Low 1 3.82
113 Low 1 3.82
114 Low 1 3.76
115 Low 1 3.91
116 Intermediate 1 3.56
117 Low 1 3.66
118 Intermediate 1 3.65
119 Low 1 3.61
120 Intermediate 1 3.65
121 Low 1 4.16
122 Intermediate 1 3.74
123 Intermediate 1 3.60
124 Low 1 3.50
125 Low 1 3.76
126 Low 1 3.85
127 Low 1 3.83
128 Low 1 3.60
I am confused as to how to calculate the probability of brooding without binning the data by mother. The closest I am getting to this figure is a logistic regression plot without the smooth line. Here is my code:
ggplot(x, aes(Final_Size, Embryo_Presence_vs._Absence)) +
geom_jitter(height = 0.05) +
stat_smooth(method="glm", se= FALSE, method.args = list(family=binomial))
Even if I do somehow complete this figure it does not calculate probability of brooding. Does anyone have packages or functions that I can read about that would calculate empirical probability? I am finding many posts about probability density plots and I am lost on where to look next.
The figure you're trying to emulate shows the expected values, which we can calculate by fitting the model with glm()
:
snails <- glm(
# use * instead of + to allow odds ratios to differ b/t treatments
Embryo_Presence_vs._Absence ~ P_Treatment * Final_Size,
family = binomial(link = "logit"),
data = x
)
When constructing the plot, we can map the y
aesthetic to the expected values using fitted.values(snails)
instead of the observed values (Embryo_Presence_vs._Absence
). For example:
ggplot(x, aes(Final_Size, fitted.values(snails),)) +
geom_point(aes(shape = P_Treatment)) +
scale_shape_manual(values = c(1, 16)) + theme_classic()