Search code examples
rdataframedplyrr-plotly

Create secondary axis for stacked barplot with line trace in R plotly


I am implementing a function which builds a stacked % barplot with an additional line trace. I can create the stacked bar plot using either a wide-form or a long-form dataframe. The two code sections below produce plots that look essentially the same:

Using a wide-form dataframe:

library(dplyr)
library(plotly)  # install.packages("plotly")

# simple example data for SO post
some_dates = c(as.Date('2021-01-01'), as.Date('2021-02-01'),
               as.Date('2021-03-01'), as.Date('2021-04-01'))

bar1 = c(0.25,0.45,0.65,0.75)
bar2 = c(0.60,0.40,0.20,0.10)
bar3 = c(0.15,0.15,0.15,0.15)

line_data = c(0,1,2,3)

# wide form dataframe
df_bars = data.frame("db" = some_dates, "b1" = bar1,
                     "b2" = bar2, "b3" = bar3)

df_line = data.frame("line_dates" = some_dates, "line" = line_data)

plot_so1 = plot_ly(x = df_bars$db,
                   y = df_bars[[colnames(df_bars)[2]]],
                   type = 'bar',
                   name = colnames(df_bars)[2]) %>% 
  layout(title = 'My plot title',
         xaxis = list(title = 'db'),
         yaxis = list(title = 'CatProportions'),
         barmode = 'stack',
         showlegend = TRUE)

# Now loop through the rest of the columns except for the ones already used.
# This is done because "in the wild", the plot is being built in a function
# that has data which is passed to it so the number and names of the columns
# that are used to build the plot are not know in advance.
for (col_index in 3:length(some_dates)) {
  plot_so1 =
    add_trace(plot_so1,
              x = df_bars$db,
              y = df_bars[[colnames(df_bars)[col_index]]],
              name = colnames(df_bars)[col_index])
}

Using a long-form dataframe:

## long form of dataframe #########################
df_bars_long = df_bars %>%
  pivot_longer(!db, names_to = "Categories", values_to = "CatProportions")
# build same plot from long form dataframe
plot_so2 = plot_ly(data = df_bars_long,
                   x = ~db, y = ~CatProportions,
                   color = ~Categories,
                   type = "bar") %>% 
  layout(barmode = "stack")

## above works, now try to add the line trace #####
plot_so2 = plot_ly(data = df_bars_long,
                   x = ~db, y = ~CatProportions,
                   color = ~Categories,
                   type = "bar") %>% 
  # add_trace(x = df_line$line_dates,
  #           y = df_line$line,
  #           type = 'scatter', mode = 'lines', name = 'my line',
  #           line = list(color = '#000000')) %>% 
  layout(title = 'My plot title',
         xaxis = list(title = 'db'),
         yaxis = list(title = 'CatProportions'),
         barmode = 'stack',
         showlegend = TRUE)

enter image description here

I understand that using the long form to create plots like this is best practice, but I show both methods above because I want to add a line trace using data from another dataframe which has one column for the x values and one column for the y values and have only been able to add this trace using the wide-form which I can do by adding the following code segment to the wide-form code:

plot_so1 = add_trace(plot_so1,
                     x = df_line$line_dates,
                     y = df_line$line,
                     type = 'scatter', mode = 'lines', name = 'my line',
                     line = list(color = '#000000'))

This produces the following plot:

enter image description here

My primary question is, how do I create a secondary y-axis for the line trace from the wide from dataframe code? My secondary question is: can the final plot I'm looking for be done with the long form dataframe and if so, how?

This post got me started on this problem:

Stacked Bar Chart with Line Chart not working in R with plotly

but it didn't involve a stacked barplot which seems to make life more interesting.


Solution

  • This part is unchanged:

    some_dates = c(as.Date('2021-01-01'), as.Date('2021-02-01'),
                   as.Date('2021-03-01'), as.Date('2021-04-01'))
    line_data = c(0,1,2,3)
    
    # wide form dataframe
    df_bars = data.frame("db" = some_dates, "b1" = bar1,
                         "b2" = bar2, "b3" = bar3)
    
    df_line = data.frame("line_dates" = some_dates, "line" = line_data)
    

    If we want to include a line into the plot, we need to know the y values of the line at each date, so we merge df_line with df_bars_long using merge():

    df_bars_long = df_bars %>%
      pivot_longer(!db, names_to = "Categories", values_to = "CatProportions") %>%
      merge(df_line, by.y = "line_dates", by.x = "db") %>%
      group_by(db) %>%
      dplyr::mutate(line = ifelse(duplicated(line), NA, line))
    
    > df_bars_long
               db Categories CatProportions line
    1  2021-01-01         b1           0.25    0
    2  2021-01-01         b2           0.60   NA
    3  2021-01-01         b3           0.15   NA
    4  2021-02-01         b1           0.45    1
    ..         ..         ..             ..   ..
    

    Then, the plot:

    plot_so2 <- plot_ly(data = df_bars_long,
                        x = ~db, y = ~CatProportions,
                        color = ~Categories,
                        type = "bar") %>%
      add_lines(y = ~line,
                name = "my line",
                line = list(color = '#000000'),
                showlegend = TRUE,
                yaxis = "y2") %>%
      layout(title = 'My plot title',
             xaxis = list(title = 'db'),
             yaxis = list(title = 'CatProportions'),
             barmode = 'stack',
             yaxis2 = list(overlaying = "y",
                           side = "right", range = range(na.omit(df_bars_long$line))))
    > plot_so2
    

    enter image description here