Search code examples
rggplot2probabilitylogistic-regression

How to make a probability figure using a logistic regression?


I am trying to remake this figure with my data. https://link.springer.com/article/10.1007/s00442-009-1522-7#Fig2

I have 2 binary factors (P treatment & Embryo_Presence_vs._Absence) and 1 continuous value (Final_Size). Here is my data:

print.data.frame(x[c("P_Treatment", "Embryo_Presence_vs._Absence", "Final_Size")])

     P_Treatment Embryo_Presence_vs._Absence Final_Size
1   Intermediate                           0       3.80
2   Intermediate                           0       3.76
3   Intermediate                           0       3.70
4   Intermediate                           0       3.73
5   Intermediate                           0       3.10
6   Intermediate                           0       3.77
7   Intermediate                           0       3.59
8   Intermediate                           0       3.73
9   Intermediate                           0       3.36
10  Intermediate                           0       3.81
11  Intermediate                           0       3.72
12  Intermediate                           0       3.92
13  Intermediate                           0       3.56
14  Intermediate                           0       3.78
15  Intermediate                           0       3.63
16  Intermediate                           0       3.27
17  Intermediate                           0       3.60
18  Intermediate                           0       3.74
19  Intermediate                           0       3.60
20  Intermediate                           0       3.65
21  Intermediate                           0       3.59
22  Intermediate                           0       3.75
23  Intermediate                           0       3.78
24  Intermediate                           0       3.55
25  Intermediate                           0       3.65
26  Intermediate                           0       3.65
27           Low                           0       3.91
28           Low                           0       3.72
29           Low                           0       3.77
30           Low                           0       3.73
31           Low                           0       3.65
32           Low                           0       3.57
33           Low                           0       3.82
34           Low                           0       3.76
35           Low                           0       3.88
36           Low                           0       4.10
37           Low                           0       3.71
38           Low                           0       3.57
39           Low                           0       3.77
40           Low                           0       3.60
41           Low                           0       3.70
42           Low                           0       3.55
43           Low                           0       4.00
44           Low                           0       3.56
45           Low                           0       3.71
46           Low                           0       3.61
47           Low                           0       3.63
48           Low                           0       3.72
49           Low                           0       3.80
50           Low                           0       3.86
51           Low                           0       3.08
52           Low                           0       3.81
53           Low                           0       3.73
54           Low                           0       3.84
55           Low                           0       3.76
56           Low                           0       3.66
57           Low                           0       3.70
58           Low                           0       3.71
59           Low                           0       3.60
60           Low                           0       3.75
61           Low                           0       3.74
62  Intermediate                           1       3.89
63           Low                           1       3.87
64  Intermediate                           1       3.99
65  Intermediate                           1       3.93
66  Intermediate                           1       3.65
67  Intermediate                           1       3.64
68  Intermediate                           1       3.81
69           Low                           1       3.67
70           Low                           1       3.52
71  Intermediate                           1       3.70
72           Low                           1       3.83
73           Low                           1       3.65
74           Low                           1       3.94
75  Intermediate                           1       3.75
76           Low                           1       3.71
77  Intermediate                           1       3.56
78  Intermediate                           1       3.89
79           Low                           1       3.72
80           Low                           1       3.25
81  Intermediate                           1       3.79
82  Intermediate                           1       3.60
83  Intermediate                           1       3.88
84  Intermediate                           1       3.75
85  Intermediate                           1       3.75
86           Low                           1       3.58
87  Intermediate                           1       3.75
88  Intermediate                           1       3.65
89           Low                           1       3.60
90  Intermediate                           1       3.68
91           Low                           1       3.65
92  Intermediate                           1       3.88
93  Intermediate                           1       3.77
94  Intermediate                           1       3.63
95           Low                           1       3.68
96  Intermediate                           1       3.83
97  Intermediate                           1       3.85
98           Low                           1       3.81
99  Intermediate                           1       3.56
100 Intermediate                           1       3.70
101          Low                           1       3.74
102          Low                           1       3.65
103 Intermediate                           1       3.69
104 Intermediate                           1       3.89
105 Intermediate                           1       3.70
106 Intermediate                           1       3.70
107 Intermediate                           1       3.76
108 Intermediate                           1       3.66
109 Intermediate                           1       3.78
110 Intermediate                           1       4.22
111 Intermediate                           1       3.68
112          Low                           1       3.82
113          Low                           1       3.82
114          Low                           1       3.76
115          Low                           1       3.91
116 Intermediate                           1       3.56
117          Low                           1       3.66
118 Intermediate                           1       3.65
119          Low                           1       3.61
120 Intermediate                           1       3.65
121          Low                           1       4.16
122 Intermediate                           1       3.74
123 Intermediate                           1       3.60
124          Low                           1       3.50
125          Low                           1       3.76
126          Low                           1       3.85
127          Low                           1       3.83
128          Low                           1       3.60

I am confused as to how to calculate the probability of brooding without binning the data by mother. The closest I am getting to this figure is a logistic regression plot without the smooth line. Here is my code:

ggplot(x, aes(Final_Size, Embryo_Presence_vs._Absence)) +
  geom_jitter(height = 0.05) +
  stat_smooth(method="glm", se= FALSE, method.args = list(family=binomial))

Poor logistic regression plot

Even if I do somehow complete this figure it does not calculate probability of brooding. Does anyone have packages or functions that I can read about that would calculate empirical probability? I am finding many posts about probability density plots and I am lost on where to look next.


Solution

  • The figure you're trying to emulate shows the expected values, which we can calculate by fitting the model with glm():

    snails <- glm(
      # use * instead of + to allow odds ratios to differ b/t treatments
      Embryo_Presence_vs._Absence ~ P_Treatment * Final_Size,
      family = binomial(link = "logit"),
      data = x
    )
    

    When constructing the plot, we can map the y aesthetic to the expected values using fitted.values(snails) instead of the observed values (Embryo_Presence_vs._Absence). For example:

    ggplot(x, aes(Final_Size, fitted.values(snails),)) + 
      geom_point(aes(shape = P_Treatment)) +
      scale_shape_manual(values = c(1, 16)) + theme_classic()
    

    enter image description here