Search code examples
rggplot2graphfinance

R: ggplot2 multiple regression lines grouped by variable


I have a dataframe (sample below) with 3 columns. My goal is to have the variable "Return" on the y-axis and "BetaRealized" on the x-axis. Based on that, I would like to have two regression lines grouped by "SML" e.g. one regression line for the two "Theoretical" values and one for the 10 "Empirical" values. Preferably I would like to use ggplot2.

I've looked through several other questions but I wasn't able to find one that fits my case. As I am very new to R, I would greatly appreciate any help. Feel free to help me improve my question for future users if necessary.

Reproducible data sample:

structure(list(SML = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L), .Label = c("Empirical", "Theoretical"), class = "factor"), 
    Return = c(0.00136162543341773, 0.00327371856919072, 0.00402550498386094, 
    0.00514512870557883, 0.00491788632261087, 0.00501053666090353, 
    0.00485590289408263, 0.00576880451680399, 0.00579134238930521, 
    0.00704131096883141, 0.00471917614445859, 0), BetaRealized = c(0.42574984058487, 
    0.576898009418581, 0.684024167075167, 0.763551381826944, 
    0.833875797322081, 0.902738972263857, 0.976227211834564, 
    1.06544414896672, 1.19436401770255, 1.50932083346054, 0.893219438045588, 
    0)), class = "data.frame", row.names = c(NA, -12L))

Solution

  • Following AntoniosK comment, it seems the solution is to use geom_smooth with a color argument in the following manner. First, transforming you sample data into a dataframe:

    df<-data.frame(structure(list(SML = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 2L, 2L), .Label = c("Empirical", "Theoretical"), class = "factor"), 
    Return = c(0.00136162543341773, 0.00327371856919072, 0.00402550498386094, 
    0.00514512870557883, 0.00491788632261087, 0.00501053666090353, 
    0.00485590289408263, 0.00576880451680399, 0.00579134238930521, 
    0.00704131096883141, 0.00471917614445859, 0), BetaRealized = c(0.42574984058487, 
    0.576898009418581, 0.684024167075167, 0.763551381826944, 
    0.833875797322081, 0.902738972263857, 0.976227211834564, 
    1.06544414896672, 1.19436401770255, 1.50932083346054, 0.893219438045588, 
    0)), class = "data.frame", row.names = c(NA, -12L)))
    

    In the sequence, just call ggplot like this:

    ggplot(df, aes(BetaRealized, Return, color = SML)) + geom_point()+geom_smooth(method=lm, se=FALSE)
    

    the output will be this one: graph

    Addtionally, you can add the equation using the package ggpubr:

    ggplot(df, aes(BetaRealized, Return, color = SML)) + geom_point()+stat_smooth(method=lm, se=FALSE)+
    stat_regline_equation()
    

    Finally, depending on your objectvei, it may be interesting to use facet_wrap to distinguish the categories:

    ggplot(df, aes(BetaRealized, Return, color = SML)) + geom_point()+ 
        stat_smooth(method=lm, se=FALSE)+ facet_wrap(~SML)+
        stat_regline_equation()
    

    The image will look like this: graph2