I am implementing a function which builds a stacked % barplot with an additional line trace. I can create the stacked bar plot using either a wide-form or a long-form dataframe. The two code sections below produce plots that look essentially the same:
Using a wide-form dataframe:
library(dplyr)
library(plotly) # install.packages("plotly")
# simple example data for SO post
some_dates = c(as.Date('2021-01-01'), as.Date('2021-02-01'),
as.Date('2021-03-01'), as.Date('2021-04-01'))
bar1 = c(0.25,0.45,0.65,0.75)
bar2 = c(0.60,0.40,0.20,0.10)
bar3 = c(0.15,0.15,0.15,0.15)
line_data = c(0,1,2,3)
# wide form dataframe
df_bars = data.frame("db" = some_dates, "b1" = bar1,
"b2" = bar2, "b3" = bar3)
df_line = data.frame("line_dates" = some_dates, "line" = line_data)
plot_so1 = plot_ly(x = df_bars$db,
y = df_bars[[colnames(df_bars)[2]]],
type = 'bar',
name = colnames(df_bars)[2]) %>%
layout(title = 'My plot title',
xaxis = list(title = 'db'),
yaxis = list(title = 'CatProportions'),
barmode = 'stack',
showlegend = TRUE)
# Now loop through the rest of the columns except for the ones already used.
# This is done because "in the wild", the plot is being built in a function
# that has data which is passed to it so the number and names of the columns
# that are used to build the plot are not know in advance.
for (col_index in 3:length(some_dates)) {
plot_so1 =
add_trace(plot_so1,
x = df_bars$db,
y = df_bars[[colnames(df_bars)[col_index]]],
name = colnames(df_bars)[col_index])
}
Using a long-form dataframe:
## long form of dataframe #########################
df_bars_long = df_bars %>%
pivot_longer(!db, names_to = "Categories", values_to = "CatProportions")
# build same plot from long form dataframe
plot_so2 = plot_ly(data = df_bars_long,
x = ~db, y = ~CatProportions,
color = ~Categories,
type = "bar") %>%
layout(barmode = "stack")
## above works, now try to add the line trace #####
plot_so2 = plot_ly(data = df_bars_long,
x = ~db, y = ~CatProportions,
color = ~Categories,
type = "bar") %>%
# add_trace(x = df_line$line_dates,
# y = df_line$line,
# type = 'scatter', mode = 'lines', name = 'my line',
# line = list(color = '#000000')) %>%
layout(title = 'My plot title',
xaxis = list(title = 'db'),
yaxis = list(title = 'CatProportions'),
barmode = 'stack',
showlegend = TRUE)
I understand that using the long form to create plots like this is best practice, but I show both methods above because I want to add a line trace using data from another dataframe which has one column for the x values and one column for the y values and have only been able to add this trace using the wide-form which I can do by adding the following code segment to the wide-form code:
plot_so1 = add_trace(plot_so1,
x = df_line$line_dates,
y = df_line$line,
type = 'scatter', mode = 'lines', name = 'my line',
line = list(color = '#000000'))
This produces the following plot:
My primary question is, how do I create a secondary y-axis for the line trace from the wide from dataframe code? My secondary question is: can the final plot I'm looking for be done with the long form dataframe and if so, how?
This post got me started on this problem:
Stacked Bar Chart with Line Chart not working in R with plotly
but it didn't involve a stacked barplot which seems to make life more interesting.
This part is unchanged:
some_dates = c(as.Date('2021-01-01'), as.Date('2021-02-01'),
as.Date('2021-03-01'), as.Date('2021-04-01'))
line_data = c(0,1,2,3)
# wide form dataframe
df_bars = data.frame("db" = some_dates, "b1" = bar1,
"b2" = bar2, "b3" = bar3)
df_line = data.frame("line_dates" = some_dates, "line" = line_data)
If we want to include a line into the plot, we need to know the y values of the line at each date, so we merge df_line
with df_bars_long
using merge()
:
df_bars_long = df_bars %>%
pivot_longer(!db, names_to = "Categories", values_to = "CatProportions") %>%
merge(df_line, by.y = "line_dates", by.x = "db") %>%
group_by(db) %>%
dplyr::mutate(line = ifelse(duplicated(line), NA, line))
> df_bars_long
db Categories CatProportions line
1 2021-01-01 b1 0.25 0
2 2021-01-01 b2 0.60 NA
3 2021-01-01 b3 0.15 NA
4 2021-02-01 b1 0.45 1
.. .. .. .. ..
Then, the plot:
plot_so2 <- plot_ly(data = df_bars_long,
x = ~db, y = ~CatProportions,
color = ~Categories,
type = "bar") %>%
add_lines(y = ~line,
name = "my line",
line = list(color = '#000000'),
showlegend = TRUE,
yaxis = "y2") %>%
layout(title = 'My plot title',
xaxis = list(title = 'db'),
yaxis = list(title = 'CatProportions'),
barmode = 'stack',
yaxis2 = list(overlaying = "y",
side = "right", range = range(na.omit(df_bars_long$line))))
> plot_so2