How do you repeat on filtering datasets and then running regressions without writing out individual code?
I want to run a linear regression on the mtcars
data where the data are all of mtcars, the IV is mtcars$am
, and the DV is mtcars$mpg
. I then want to use the grouping variable mtcars$gear
to create 3 datasets where mtcars$gear
is 3, 4, or 5, and then runs the regressions again with these 3 datasets separately.
The long process that I currently used is below.
Unique values of variables of interest:
## variables of interets
unique(mtcars$mpg)
# ---- NOTE: DV is mpg
unique(mtcars$am)
# ---- NOTE: IV is mpg
unique(mtcars$gear)
# ---- NOTE: grouping variable is gear
Here is the baseline code I used for the regression:
## linear regression with all data
lm__am_on_mpg__mtcars <- lm(mpg ~ am, data=mtcars)
summary(lm__am_on_mpg__mtcars)
I then used the filter()
command in the tidyverse
package to create 3 datasets, where mtcars$gear
is 3, 4, or 5
### list of filtered datasets
str(mtcars__gear_is_3)
str(mtcars__gear_is_4)
str(mtcars__gear_is_5)
I then created 3 regressions with the same basic structure as the base regression above, but with different datasets connected with different mtcars$gear
levels.
#### when mtcars__gear_is_3 is dataset used
lm__am_on_mpg__mtcars__gear_is_3 <- lm(mpg ~ am, data=mtcars__gear_is_3)
summary(lm__am_on_mpg__mtcars__gear_is_3)
#### when mtcars__gear_is_4 is dataset used
lm__am_on_mpg__mtcars__gear_is_4 <- lm(mpg ~ am, data=mtcars__gear_is_4)
summary(lm__am_on_mpg__mtcars__gear_is_4)
#### when mtcars__gear_is_5 is dataset used
lm__am_on_mpg__mtcars__gear_is_5 <- lm(mpg ~ am, data=mtcars__gear_is_5)
summary(lm__am_on_mpg__mtcars__gear_is_5)
This seems to work, but it also seems to be a lot of code. I feel this could be accomplished with more concise code. I want to know if I can speed this process up by writing code that:
(A) creates different datasets in a shorter way using the tidyverse
filter method
(B) creates different regressions in a shorter way that just swaps the dataset names when appropriate
without having to write all of the code the long way.
Here are my questions: (1) Is this possible to do in R in general? (2) Is this possible with datasets? (2.1) If so, how? (3) Is this possible with regressions? (3.1) If so, how?
====================
Here is my R code that I used to complete this task the long way
# How do you repeat on filtering datasets and then running regressions in R without writing out individual code?
## dataset of interest
mtcars
### info about dataset
head(mtcars)
str(mtcars)
columns(mtcars)
## variables of interets
unique(mtcars$mpg)
# ---- NOTE: DV is mpg
unique(mtcars$am)
# ---- NOTE: IV is mpg
unique(mtcars$gear)
# ---- NOTE: grouping variable is gear
## linear regression with all data
lm__am_on_mpg__mtcars <- lm(mpg ~ am, data=mtcars)
summary(lm__am_on_mpg__mtcars)
## filter data based on mtcars$gear
### loads tidyverse
library(tidyverse)
### when mtcars$gear == 3
#### creates filtered dataset
# ---- NOTE: starting dataset - mtcars
# ---- NOTE: ending dataset - mtcars__gear_is_3
# ---- NOTE: filter variable - gear
# ---- NOTE: filter variable value(s) - 3
##### starting dataset
str(mtcars)
##### unique values of starting dataset$filter
unique(mtcars$gear)
##### filters data into post-filter dataset
mtcars__gear_is_3 <- filter(mtcars, (gear == "3"))
##### turns post-filter dataset into data frame
mtcars__gear_is_3 <- data.frame(mtcars__gear_is_3)
##### post-filter dataset
str(mtcars__gear_is_3)
##### unique values of post-filter dataset$filter
unique(mtcars__gear_is_3$gear)
### when mtcars$gear == 4
#### creates filtered dataset
# ---- NOTE: starting dataset - mtcars
# ---- NOTE: ending dataset - mtcars__gear_is_4
# ---- NOTE: filter variable - gear
# ---- NOTE: filter variable value(s) - 4
##### starting dataset
str(mtcars)
##### unique values of starting dataset$filter
unique(mtcars$gear)
##### filters data into post-filter dataset
mtcars__gear_is_4 <- filter(mtcars, (gear == "4"))
##### turns post-filter dataset into data frame
mtcars__gear_is_4 <- data.frame(mtcars__gear_is_4)
##### post-filter dataset
str(mtcars__gear_is_4)
##### unique values of post-filter dataset$filter
unique(mtcars__gear_is_4$gear)
### when mtcars$gear == 5
#### creates filtered dataset
# ---- NOTE: starting dataset - mtcars
# ---- NOTE: ending dataset - mtcars__gear_is_5
# ---- NOTE: filter variable - gear
# ---- NOTE: filter variable value(s) - 5
##### starting dataset
str(mtcars)
##### unique values of starting dataset$filter
unique(mtcars$gear)
##### filters data into post-filter dataset
mtcars__gear_is_5 <- filter(mtcars, (gear == "5"))
##### turns post-filter dataset into data frame
mtcars__gear_is_5 <- data.frame(mtcars__gear_is_5)
##### post-filter dataset
str(mtcars__gear_is_5)
##### unique values of post-filter dataset$filter
unique(mtcars__gear_is_5$gear)
## regressions where data is filtered by gear
### list of filtered datasets
str(mtcars__gear_is_3)
str(mtcars__gear_is_4)
str(mtcars__gear_is_5)
#### when mtcars__gear_is_3 is dataset used
lm__am_on_mpg__mtcars__gear_is_3 <- lm(mpg ~ am, data=mtcars__gear_is_3)
summary(lm__am_on_mpg__mtcars__gear_is_3)
#### when mtcars__gear_is_4 is dataset used
lm__am_on_mpg__mtcars__gear_is_4 <- lm(mpg ~ am, data=mtcars__gear_is_4)
summary(lm__am_on_mpg__mtcars__gear_is_4)
#### when mtcars__gear_is_5 is dataset used
lm__am_on_mpg__mtcars__gear_is_5 <- lm(mpg ~ am, data=mtcars__gear_is_5)
summary(lm__am_on_mpg__mtcars__gear_is_5)
May be you will be able to achieve you goal with something like this :
library(data.table)
dt <- as.data.table(mtcars)
formulas <- paste0("lm(mpg ~ am, data = dt[gear == ", unique(dt[,gear]), "])" )
l <- lapply(formulas, function(x) eval(parse(text=x)))
and to see all models, just use :
l
or to see summary of one of the models :
summary(lm[[1]])