I have a dataframe (sample below) with 3 columns. My goal is to have the variable "Return"
on the y-axis and "BetaRealized"
on the x-axis. Based on that, I would like to have two regression lines grouped by "SML"
e.g. one regression line for the two "Theoretical" values and one for the 10 "Empirical" values. Preferably I would like to use ggplot2
.
I've looked through several other questions but I wasn't able to find one that fits my case. As I am very new to R, I would greatly appreciate any help. Feel free to help me improve my question for future users if necessary.
Reproducible data sample:
structure(list(SML = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L), .Label = c("Empirical", "Theoretical"), class = "factor"),
Return = c(0.00136162543341773, 0.00327371856919072, 0.00402550498386094,
0.00514512870557883, 0.00491788632261087, 0.00501053666090353,
0.00485590289408263, 0.00576880451680399, 0.00579134238930521,
0.00704131096883141, 0.00471917614445859, 0), BetaRealized = c(0.42574984058487,
0.576898009418581, 0.684024167075167, 0.763551381826944,
0.833875797322081, 0.902738972263857, 0.976227211834564,
1.06544414896672, 1.19436401770255, 1.50932083346054, 0.893219438045588,
0)), class = "data.frame", row.names = c(NA, -12L))
Following AntoniosK comment, it seems the solution is to use geom_smooth with a color argument in the following manner. First, transforming you sample data into a dataframe:
df<-data.frame(structure(list(SML = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L), .Label = c("Empirical", "Theoretical"), class = "factor"),
Return = c(0.00136162543341773, 0.00327371856919072, 0.00402550498386094,
0.00514512870557883, 0.00491788632261087, 0.00501053666090353,
0.00485590289408263, 0.00576880451680399, 0.00579134238930521,
0.00704131096883141, 0.00471917614445859, 0), BetaRealized = c(0.42574984058487,
0.576898009418581, 0.684024167075167, 0.763551381826944,
0.833875797322081, 0.902738972263857, 0.976227211834564,
1.06544414896672, 1.19436401770255, 1.50932083346054, 0.893219438045588,
0)), class = "data.frame", row.names = c(NA, -12L)))
In the sequence, just call ggplot like this:
ggplot(df, aes(BetaRealized, Return, color = SML)) + geom_point()+geom_smooth(method=lm, se=FALSE)
the output will be this one: graph
Addtionally, you can add the equation using the package ggpubr:
ggplot(df, aes(BetaRealized, Return, color = SML)) + geom_point()+stat_smooth(method=lm, se=FALSE)+
stat_regline_equation()
Finally, depending on your objectvei, it may be interesting to use facet_wrap to distinguish the categories:
ggplot(df, aes(BetaRealized, Return, color = SML)) + geom_point()+
stat_smooth(method=lm, se=FALSE)+ facet_wrap(~SML)+
stat_regline_equation()
The image will look like this: graph2