ggpredict and quartiles: predictions with only one data point?

I am using ggefects::ggpredict() to plot a regression model including two-way interactions: categorical:continuous, continuous:continuous, continuous:continuous (2nd degree polynomial). My first choice is to plot the interaction between continuous variables using quartiles, including minimum and maximum value, by setting terms = c("var.cont1[all]", "var.cont2[quart]"), as shown in this ggeffects vignette. In this vigentte, the minimum and maximum value of the variable of interest have many observations.

However, in my data I have a single observation for both mininimum and maximum, and ggpredict will plot predictions just fine, although with very wide confidence intervals for the max value. In such cases, will ggpredict() plot a prediction based on a single data point? I want to be sure I understand waht's going on. Many thanks in advance!

I have provided a reproducible example below, in which I force the data to have a single observation for the min and max values. The data comes from this ggeffects vignette.

#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#>    4.00   10.00   20.00   42.40   42.75  168.00       6

c(sum(!$c12hour) & efc$c12hour == 4), sum(!$c12hour) & efc$c12hour == 168))
#> [1] 24 86

efc2 <- efc[!$c12hour), ]
efc2 <- arrange(efc2, c12hour)
efc2$c12hour[1] <- 3
efc2$c12hour[nrow(efc2)] <- 169
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    3.00   10.00   20.00   42.40   42.75  169.00
#>   3   4   5   6   7   8   9  10  11  12  14  15  16  17  18  20  21  22  24  25 
#>   1  23  29  60  32  42  11  70   6  23  34  43  12   4  16  58  12  11  20  28 
#>  26  27  28  30  35  36  39  40  42  43  45  48  49  50  55  56  59  60  62  65 
#>   1   2  28  40  24   1   1  35   9   1   4   4   2  22   1   6   1   7   1   3 
#>  70  77  80  84  85  89  90  91 100 105 110 118 119 120 125 126 128 130 140 148 
#>  13   3   5   6   2   1   2   3  13   1   4   1   1   8   1   1   1   1   6   1 
#> 150 154 160 161 162 168 169 
#>   4   1   6   2   1  85   1

fit <- lm(barthtot ~ c12hour * c161sex + neg_c_7, data = efc2)
plot(ggpredict(fit, terms = c("c161sex", "c12hour[quart]")))

Created on 2023-10-05 with reprex v2.0.2


  • Internally, ggpredict() call predict() (and ggemmeans() calls emmeans(), ggeffect() calls effect()). The predictions are based on the model's coefficients - the confidence intervals (resp. how narrow / wide these are) depends on the amount of data.

    Maybe this vignette clarifies how predicted values relate to regression coefficients.

    In your particular example, I assume that due to having only one data point at the "tails", confidence intervals are wide, indicating that based on the few data, there's a larger range of plausible values for your predictions (higher "uncertainty", if you like).