Search code examples
rggplot2data-modelingmodeling

R fmodel() number of variables that can be used


For the function: fmodel(model_object, ~ x_var + color_var + facet_var) Is it possible to visualize a model with additional variables?

With 3 variables there is no problem creating a graph:

install.packages('statisticalModeling')
library(statisticalModeling)

mod1 <- lm(wage ~ age + sex + sector, data = mosaicData::CPS85)
fmodel(mod1, ~ age + sex + sector)

enter image description here

With 5 variables, only the first 4 variables are included in the graph, is this the maximum allowable for this function? Is there a way I can graph all 5 variables?

mod1 <- lm(wage ~ age + sex + sector + married + educ, data = mosaicData::CPS85)
fmodel(mod1, ~ age + sex + sector + married + educ)

enter image description here

I have tried "forcing" the last variable to be displayed by defining specific values for the 5th explanatory variable and it results in a strange looking output.

mod1 <- lm(wage ~ age + sex + sector + married + educ, data = mosaicData::CPS85)
fmodel(mod1, ~ age + sex + sector + married + educ,
       educ = c(10, 12, 16))

enter image description here


Solution

  • There are three main problems here. The first is that the statisticalModeling package is no longer on CRAN, and the archived version appears to have last been updated over 6 years ago. It may therefore have some compatability issues with ggplot, which has been updated extensively since then.

    The second is that, like many ggplot wrappers, fmodel aims to make it easier to create a ggplot, but what you gain in ease-of-use you lose in the ability to customize the plot or produce plots that were not part of the design spec. In these situations it is often best to wrangle the data yourself and plot with ggplot directly.

    The third (and most important) issue is that your plot already contains a lot of information, and adding yet another aesthetic mapping makes this worse.

    All that said, if you really want to produce a plot that demonstrates the effects of 5 different predictor variables across all their possible values, you could do:

    library(ggplot2)
    
    mod1 <- lm(wage ~ age + sex + sector + married + educ, data = mosaicData::CPS85)
    
    pred_df <- with(mosaicData::CPS85, 
                    expand.grid(age     = unique(age),
                                sector  = unique(sector),
                                married = unique(married),
                                sex     = unique(sex),
                                educ    = unique(educ)))
    
    pred_df$wage <- predict(mod1, pred_df)
    
    ggplot(pred_df, aes(age, wage, linetype = sex, color = educ, 
                        group = interaction(educ, sex))) +
      geom_line() +
      facet_grid(sector ~ married) +
      scale_color_viridis_c() +
      theme_bw(base_size = 16)
    

    enter image description here

    You can see why the authors of the statisticalModrling package may have held back from adding this facility. This is a pretty awful plot from a data visualization perspective.