Search code examples
rggplot2smoothing

Can I get geom_smooth() to allow line breaks when there are NA values?


I'm hoping to find a way to for line breaks to show up while using geom_smooth() - is this possible?

Here's sample data and code I'm using and the resulting plot:

game_number <- c(1:52)

toi <- c(NA, NA, NA, NA, 20.4, 20.2, 19.4, 18.6, 17.8, 17.1, 17.7, 17.3, 16.8, 17.1, 17.8, 17.3, 16.6,
        16.9, 17.4, 16.9, 16.1, 16.6, 16.9, 16.4, NA, NA, NA, NA, NA, NA, 16.9, 18.2, 18.5, 16.6, 16.3, 15.7, 
        15.1, 14.7, 16.5, 17.9, 16.9, NA, 17.6, 18.1, 17.9, 17.2, 18.2, 18.0, 17.3, 17.8, 18.3, 17.9)

toi_df <- tibble(player = 'Nils Lundkvist', game_number = game_number, toi = toi)
plot <- ggplot(toi_df, aes(x = game_number, y = toi, group = player, colour = player)) +
            geom_line(size = 0.6) +
            geom_smooth(se = F, size = 1) +
            scale_y_continuous(limits = c(0, 25), expand = c(0, 0))

The resulting plot looks like this. You can see the NA line breaks in in geom_line(), but the geom_smooth() line is connecting over the NA values. Is there a way to get geom_smooth() to behave like geom_line() in this scenario? Or some other ggplot command to use instead? Thank you!

geom_smooth() ignoring line breaks for NA values


Solution

  • I would suggest one approach where you can compute the geom_smooth() output in a independent dataframe and then merge with original data. Here an approach using broom and tidyverse packages:

    library(tidyverse)
    library(broom)
    

    First the data:

    #Data
    game_number <- c(1:52)
    toi <- c(NA, NA, NA, NA, 20.4, 20.2, 19.4, 18.6, 17.8, 17.1, 17.7, 17.3, 16.8, 17.1, 17.8, 17.3, 16.6,
             16.9, 17.4, 16.9, 16.1, 16.6, 16.9, 16.4, NA, NA, NA, NA, NA, NA, 16.9, 18.2, 18.5, 16.6, 16.3, 15.7, 
             15.1, 14.7, 16.5, 17.9, 16.9, NA, 17.6, 18.1, 17.9, 17.2, 18.2, 18.0, 17.3, 17.8, 18.3, 17.9)
    toi_df <- tibble(player = 'Nils Lundkvist', game_number = game_number, toi = toi)
    

    Now, we compute the smooth model:

    #Create smooth
    model <- loess(toi ~ game_number, data = toi_df)
    

    We create a dataframe to save the results:

    #Augment model output in a new dataframe
    toi_df2 <- augment(model, toi_df)
    

    We merge the data:

    #Merge data
    toi_df3 <- merge(toi_df,
                     toi_df2[,c("player","game_number",".fitted")],
                     by=c("player","game_number"),all.x = T)
    

    Finally, we plot using geom_line():

    #Plot
    ggplot(toi_df3, aes(x = game_number, y = toi, group = player, colour = player)) +
      geom_line(size = 0.6) +
      geom_line(aes(y=.fitted),size=1) +
      scale_y_continuous(limits = c(0, 25), expand = c(0, 0))
    

    Output:

    enter image description here

    The approach can work if you have more than one players. In that case you can group by players (group_by() from dplyr) and using do() function to estimate the smooth models for each player.

    Update:

    I add a code for multi players. In this case I have created a function to iterate across groups defined by player in a list. After creating the function you have to use split() to get a list with each player. The function myfunsmooth() compute loess. Then, you bind the data and sketch the plot. Here the code:

    The dummy data:

    #Data
    game_number <- c(1:52)
    toi <- c(NA, NA, NA, NA, 20.4, 20.2, 19.4, 18.6, 17.8, 17.1, 17.7, 17.3, 16.8, 17.1, 17.8, 17.3, 16.6,
             16.9, 17.4, 16.9, 16.1, 16.6, 16.9, 16.4, NA, NA, NA, NA, NA, NA, 16.9, 18.2, 18.5, 16.6, 16.3, 15.7, 
             15.1, 14.7, 16.5, 17.9, 16.9, NA, 17.6, 18.1, 17.9, 17.2, 18.2, 18.0, 17.3, 17.8, 18.3, 17.9)
    toi_df <- tibble(player = 'Nils Lundkvist', game_number = game_number, toi = toi)
    toi_df0 <- tibble(player = 'Zach Ellenthal', game_number = game_number, toi = toi)
    toi_df0$toi <- toi_df0$toi+15 
    toi_dfm <- rbind(toi_df,toi_df0)
    

    The function for loess():

    #Function for smoothing
    myfunsmooth <- function(x)
    {
      #Model
      model <- loess(toi ~ game_number, data = x)
      #Augment model output in a new dataframe
      y <- augment(model, x)
      #Merge data
      z <- merge(x,y[,c("player","game_number",".fitted")],
                       by=c("player","game_number"),all.x = T)
      #Return
      return(z)
    }
    

    Then, we create the list:

    #Create list by player
    List <- split(toi_dfm,toi_dfm$player)
    

    We apply the function and bind the results in a new dataframe:

    #Apply function
    List2 <- lapply(List, myfunsmooth)
    #Bind all
    dfglobal <- do.call(rbind,List2)
    rownames(dfglobal)<-NULL
    

    Finally, we plot:

    #Plot
    ggplot(dfglobal, aes(x = game_number, y = toi, group = player, colour = player)) +
      geom_line(size = 0.6) +
      geom_line(aes(y=.fitted),size=1) +
      scale_y_continuous(limits = c(0, 45), expand = c(0, 0)) 
    

    Output:

    enter image description here