Search code examples
ggplot2axis-labels

Changing ggpredict x-axis labels


I ran a model where I had to scale the continuous variable "year" to "1->15" (1st - 15th year) to get the intercept error under control, but now I'd like to change the x-axis labels back to there original form. I've seen that ggpredict can be altered like any ggplot(), but my method doesn't seem to be working. Is there something else I can try?

library(ggeffects)

ggpredict(mod, c("CYR.std", "Season")) %>% plot(rawdata = TRUE, jitter = .01) + 
  labs(title = "Predicted counts of Gulf Toadfish")

enter image description here

ggpredict(mod, c("CYR.std", "Season")) %>% plot(rawdata = TRUE, jitter = .01) + 
  labs(title = "Predicted counts of Gulf Toadfish", x = "Year", y = "Count") + 
  scale_x_continuous(breaks=c(2008, 2009, 2010, 2011, 
                              2012, 2013, 2014, 2015, 
                              2016, 2017, 2018, 2019,
                              2020, 2021, 2022), 
                     labels=c('2008', '2009', '2010', '2011', 
                            '2012', '2013', '2014', '2015', 
                            '2016', '2017', '2018', '2019',
                            '2020', '2021', '2022')) + 
  theme(axis.text.x=element_text(angle = 45, hjust = 0))

enter image description here

My raw data is too large to share, but this could work:

n <- 10

#regression coefficients
beta0 <- 1
beta1 <- 0.2

#generate covariate values
x <- runif(n=n, min=0, max=1.5)

#compute mu's
mu <- exp(beta0 + beta1 * x)

#generate Y-values
y <- rpois(n=n, lambda=mu)

#data set
data <- data.frame(y=y, x=x)

model <- glm(y ~ x, data = data, family = poisson)

ggpredict(model, c("x")) %>% plot(rawdata = TRUE, jitter = .01) + 
  scale_x_continuous(breaks=c(2008, 2009, 2010), 
                     labels=c('2008', '2009', '2010'))

Solution

  • I see a chain:

    Data |> Model |> ggpredict |> Plot

    The further down the chain, the more complex is the modification you have to make to alter some data.

    Plot

    Your approach started at the last point of the chain. scale_x_...() can only apply log transformations and such things, but doing a linear transformation, which would be necessary in your case to get 1:15 back to 2008:2022 is not possible (as far as I know, please correct me).

    Data

    So obviously, the best starting point to slip in transformed data is before modelling, e.g. finding a way for the model to handle your time variable accurately without linear transformation. I can think of trying whether there are methods for date objects, and maybe converting your time variable to a date object.

    I do not know the statistical side of this. Maybe the transformation you are doing to keep intercept in check is not necessary. Maybe there are other ways to represent a time variable accurately in your model.

    But lets suppose it is necessary to linear transform the year variable prior to modeling.

    Model

    I think the best approach now would be to tamper with the model attributes and slip in some transformed data. I did try that, but my knowledge of poisson family models is nonexistent, and it only worked halfway. ggpredict() recognized the new data points, but something was wrong with the fitted curve. I think I should have manipulated some further model attributes which are later on used to plot the fit.

    ggpredict

    The next best entry point in the chain to slip in transformed data is within the ggpredict object, which is generated from the model. This is what I chose.

    Using your example code:

    set.seed(123)
    
    library(ggeffects)
    library(ggplot2)
    
    # Transform
    
    range(x) # let's approximate the limits to [0; 1] for this example
    # our known range of years is [2008; 2010]
    
    # linear transformation
    # I solved the equation system by: 
    # 2008 = b0 + 0 * b1
    # 2010 = b0 + 1 * b1
    
    b0 <- 2008
    b1 <- 2
    
    x_transformed <- b0 + x * b1 
    
    ggp <- ggpredict(model, c("x")) # create ggpredict object
    
    unaltered <- plot(ggp, rawdata = TRUE) # test plot unaltered
    unaltered # changes everytime it is plotted :/
    
    # Alter the ggpredict object
    
    # exchange all x with the transformed x in the ggpredict object
    ggp$x <- sort(x_transformed) # this has to be sorted
    attributes(ggp)$rawdata$x <- sort(x_transformed) # raw data is unsorted
    
    # Final plot
    plot(ggp, add.data = TRUE)
    

    plot

    Explanation

    To transform the data ranging from 0 to 1 into a range from 2008 to 2010, we can use linear transformation y = b0 + x * b1. To obtain the parameters b0 and b1, we have to solve a linear equation system with two equations.

    Then we create the ggpredict object, which is essentially a data frame with some further attributes containing modeling information.

    To slip in the transformed data, we have to put it at (at least) two places: Within the data frame, and within the further attributes.

    To test if the fit maintains shape before and after manipulation, I saved an unaltered plot. What remains a mystery to me, and maybe a critical point that renders this method garbage: Everytime the plot is plotted, even without manipulations, the points are in different places. This seems very odd to me and I wonder if this is intended behaviour.

    So I know that this solution is far from optimal. As I have stated, consider changing something further up the chain.