My data:
library(tidyverse)
library(lme4)
myData <- structure(list(Subjects = c(4L, 3L, 5L, 1L, 9L, 6L, 10L, 2L,
8L, 7L), Gene1 = c(0.318630087617032, -0.58179068471591, 0.714532710891568,
-0.825259425862769, -0.359862131395465, 0.0898861437775305, 0.0962744602851301,
-0.201633952183354, 0.739840499878431, 0.123379501088869), Variant1 = c(1L,
0L, 1L, 2L, 2L, 1L, 0L, 1L, 2L, 0L), Variant2 = c(0L, 0L, 2L,
2L, 0L, 2L, 2L, 2L, 2L, 0L), Variant3 = c(1L, 1L, 0L, 2L, 0L,
1L, 1L, 1L, 2L, 1L), Variant4 = c(1L, 2L, 1L, 0L, 0L, 1L, 0L,
2L, 1L, 1L), Age = c(81L, 60L, 85L, 87L, 76L, 78L, 88L, 64L,
90L, 75L), Sex = c(0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L), RIN = c(6L,
6L, 8L, 6L, 8L, 7L, 8L, 7L, 7L, 6L), ABG = structure(c(4L, 5L,
8L, 3L, 6L, 2L, 3L, 4L, 7L, 1L), .Label = c("F1", "F10", "F2",
"F3", "F4", "F5", "F6", "F8"), class = "factor")), row.names = c(NA,
-10L), class = "data.frame", .Names = c("Subjects", "Gene1",
"Variant1", "Variant2", "Variant3", "Variant4", "Age", "Sex",
"RIN", "ABG"))
myData
Subjects Gene1 Variant1 Variant2 Variant3 Variant4 Age Sex RIN ABG
1 4 0.31863009 1 0 1 1 81 0 6 F3
2 3 -0.58179068 0 0 1 2 60 1 6 F4
3 5 0.71453271 1 2 0 1 85 0 8 F8
4 1 -0.82525943 2 2 2 0 87 1 6 F2
5 9 -0.35986213 2 0 0 0 76 0 8 F5
6 6 0.08988614 1 2 1 1 78 1 7 F10
7 10 0.09627446 0 2 1 0 88 0 8 F2
8 2 -0.20163395 1 2 1 2 64 0 7 F3
9 8 0.73984050 2 2 2 1 90 1 7 F6
10 7 0.12337950 0 0 1 1 75 1 6 F1
Gene1
is my dependent variable and Variant1
, Variant2
, Variant3
and Variant4
are my independent variables. Age
, Sex
, RIN
and ABG
are my covariates. Using the tidyverse framework (broom/dplyr/purrr/map) I'd like to iterate through Variant1:Variant4
performing the following linear regressions using a linear mixed model:
lmer(Gene1~Variant1+Age+Sex+RIN+(1|ABG), myData)
for Variant1,
lmer(Gene1~Variant2+Age+Sex+RIN+(1|ABG), myData)
for Variant2, so on ...
At the end, I'd like generate a results table with beta coefficients (Estimate), Std.Err and pValues for all Variant* (possibly using tidy/augment/glance??).
PS. The number of Variant* my vary.
Thank you!
For your problem you can gather()
in order to utilize group_split()
on all the different kinds of variants. From that point we can iterate over each split data.frame and run the linear model. Inside of the map()
we'll broom::tidy()
the data and add a column to distinguish the estimates for each model. I used map_df()
to end up with a single dataframe but you can also just use map()
to end up with a list of data.frames.
library(tidyverse)
library(lme4)
library(broom)
dat <- tribble(~Subjects, ~Gene1, ~Variant1, ~Variant2, ~Variant3, ~Variant4, ~Age, ~Sex, ~RIN, ~ABG,
1, -0.82525943, 2, 2, 2, 0, 87, 1, 6, "F2",
2, -0.20163395, 1, 2, 1, 2, 64 , 0 , 7, "F3",
3, -0.58179068, 0, 0, 1, 2, 60 , 1 , 6, "F4",
4, 0.31863009, 1, 0, 1, 1, 81 , 0 , 6, "F3",
5, 0.71453271, 1, 2, 0, 1, 85 , 0 , 8, "F8",
6, 0.08988614, 1, 2, 1, 1, 78 , 1 , 7, "F10",
7, 0.12337950, 0, 0, 1, 1, 75 , 1 , 6, "F1",
8, 0.73984050, 2, 2, 2, 1, 90 , 1 , 7, "F6",
9, -0.35986213, 2, 0, 0, 0, 76 , 0 , 8, "F5",
10, 0.09627446, 0, 2, 1, 0, 88, 0, 8, "F2")
dat %>%
gather(key = "variants", value = "var_value", Variant1:Variant4) %>%
group_split(variants) %>%
map_df(~lmer(Gene1~var_value+Age+Sex+RIN+(1|ABG), data = .x) %>%
tidy() %>%
mutate(variant_group = unique(.x$variants)))
#> # A tibble: 28 x 6
#> term estimate std.error statistic group variant_group
#> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 (Intercept) -3.91 1.96 -2.00 fixed Variant1
#> 2 var_value -0.280 0.120 -2.34 fixed Variant1
#> 3 Age 0.0401 0.00905 4.43 fixed Variant1
#> 4 Sex 0.00107 0.391 0.00274 fixed Variant1
#> 5 RIN 0.161 0.154 1.05 fixed Variant1
#> 6 sd_(Intercept).ABG 0.462 NA NA ABG Variant1
#> 7 sd_Observation.Resid… 0.00234 NA NA Residu… Variant1
#> 8 (Intercept) -3.60 2.74 -1.31 fixed Variant2
#> 9 var_value -0.0625 0.202 -0.309 fixed Variant2
#> 10 Age 0.0295 0.0175 1.69 fixed Variant2
#> # … with 18 more rows
Created on 2019-02-23 by the reprex package (v0.2.1)