Search code examples
rggplot2weighted-average

How to generate weighted mean plot using ggplot2?


I was able to generate average movement of variable lny_10 using the following code:

p1 <- ggplot(df, aes(x = year, y = lny_10)) +
  scale_x_continuous(breaks = c(1991, 1997, 2000, 2003, 2011), lim = c(1991, 2011)) + theme_bw() + stat_summary(geom = "line", fun.y = mean)

enter image description here

On the same plane, I want to just add another trend line of weighted average of the same variable where the weights are determined by the sum of lnl in each industry so that this new trend line reflects the weight of lnl in a certain industry (either Manufacturing or Fishery). In other words, if the sum in manuf. sector is greater than that of fishery industry, then more weight would be assigned to the average of lny_10 in manufacturing sector.

Any help would be appreciated!

The sample data is the following:

structure(list(firmid = structure(c("016090", "002070", "009270", 
"007700", "005800", "014990", "001460", "001460", "005800", "014990"
), format.stata = "%-6s"), year = structure(c(1992, 1992, 1992, 
1992, 1992, 1992, 1992, 1993, 1993, 1993), format.stata = "%9.0g"), 
    lny_10 = structure(c(24.0853042602539, 24.2753143310547, 
    24.1893978118896, 22.7417297363281, 24.0077304840088, 24.0432777404785, 
    24.6088676452637, 24.6565208435059, 23.8993816375732, 24.2486095428467
    ), format.stata = "%9.0g"), lnl = structure(c(6.81234502792358, 
    7.56631088256836, 7.19368600845337, 5.48063898086548, 7.38398933410645, 
    6.63331842422485, 7.81439971923828, 7.72621250152588, 7.33040523529053, 
    6.74288082122803), format.stata = "%9.0g")),  industry = structure(c("Manufacturing", "Manufacturing", "Manufacturing", 
    "Manufacturing", "Manufacturing","Fishery", "Fishery","Fishery","Fishery","Fishery"), label = "classification", format.stata = "%-51s")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

Solution

  • Calculate the weights separately by year and industry and join them back to the original dataframe before plotting.

    library(dplyr)
    library(ggplot2)
    
    dfweights <- df %>%   
       group_by(year, industry) %>%   
       summarise(lny_wmean = weighted.mean(lny_10,lnl))  
    
    df2 <- left_join(df, dfweights, by = c("year", "industry"))   
    
    df2 %>%    
       ggplot() +    
       stat_summary(aes(x = year, y = lny_10), geom = "line", fun = mean, colour = "red") +   
       theme_bw() +    
       geom_line(aes(x = year , y = lny_10), colour = "blue") +      
       geom_line(aes(x = year, y = lny_wmean), colour = "green")